Data Cloud Engineer
BlueSoft
⚲ warsaw
Wymagania
- Big Data
- Spark
Opis stanowiska
Listen carefully, I shall say zis only once… 📡 For one of our international clients, we are building a modern cloud data platform processing many bytes of streaming, audience, advertising, and content metadata in near real-time. If words like Kafka lag, partition strategy, or schema evolution do not terrify you — bienvenue. Project Overview • Building a scalable data platform for a global company supporting streaming, analytics, and real-time audience insights • Developing cloud-native data platforms based on AWS, event-driven architecture, and distributed processing (other cloud is also highly desired) • Implementing batch and streaming pipelines for advertising, telemetry, and content metadata workloads • Designing solutions using Kafka, Spark, Airflow, S3, Glue, Athena, and modern lakehouse/data mesh patterns • Driving reliability, observability, and governance across a multi-region data platform • Collaborating with Data Engineers, ML teams, Platform Engineers, and people asking “is the dashboard ready yet?” every 17 minutes 😉 Your Responsibilities • Designing and developing end-to-end pipelines using Python, Spark, Kafka, and Airflow • Building and optimizing AWS-based solutions leveraging S3, Glue, Lambda, Athena, EKS, Kinesis, and Redshift • Implementing data ingestion pipelines for both batch and streaming workloads with focus on throughput, latency, and fault tolerance • Managing the data lifecycle: schema evolution, partitioning, metadata, lineage, and retention policies • Monitoring, troubleshooting, and performance tuning the platform (aka classic “it works on my cluster” debugging) Requirements • Commercial experience as a Data Engineer / Data Platform Engineer / Cloud Engineer etc. • Strong hands-on expertise with AWS and modern cloud data ecosystems. • Practical experience with Kafka, Spark, Airflow, and Python. • Solid understanding of distributed systems, event-driven architectures, and large-scale data processing. • Experience with Terraform, Kubernetes/EKS, CI/CD, and observability tooling. • Ability to work with large-scale datasets and awareness that “temporary workaround” usually means “see you again in nearest future” 😉