NoFluffJobs Praca zdalna Mid ↻

[Remote] Data Engineer GCP Airflow BigQuery

DS STREAM

⚲ Warsaw, Katowice, Kraków, Wrocław, Poznań

20 160 - 23 520 PLN (B2B)

Wymagania

Google Cloud Platform
ETL
Airflow
Python
SQL
Spark
DevOps
CI/CD
Kubernetes (nice to have)
MLOps (nice to have)
AI (nice to have)
Scala (nice to have)
REST API (nice to have)
Terraform (nice to have)
VertexAI (nice to have)

Opis stanowiska

O projekcie: Data Engineer We’re looking for an experienced Data Engineer to build, scale, optimize, and maintain reliable data platforms. You’ll work on real-time pipelines, ML infrastructure, attribution data processing, and cloud-native solutions in a high-volume production environment. Tech Stack - GCP - Apache Spark, dbt, BigQuery - Python, SQL, Scala - Apache Airflow - Terraform, GitHub Actions, Docker, Kubernetes - VertexAI Wymagania: Must Have - Over 3 years of experience in Data Engineering and building production-scale data platforms. - Strong programming skills in Python and SQL. - Advanced experience with data modeling, ETL development, and multiple data formats. - Strong knowledge of Google Cloud Platform and cloud-native architecture design. - Expert knowledge of Apache Airflow for workflow orchestration and automation. - Hands-on experience with Apache Spark for batch and high-volume streaming workloads. - Good understanding of CI/CD and DevOps tools such as GitHub Actions or Kubernetes. Nice to Have - Experience with MLOps and ML deployment in production, preferably with Vertex AI. - Practical experience with Terraform for Infrastructure as Code. - Experience building scalable REST APIs for data or ML services. - Strong testing practices, including TDD and unit/integration tests for Spark and Airflow. - Knowledge of Scala for high-performance data processing. Codzienne zadania: - Build and maintain large-scale advertising data pipelines for clicks, impressions, attribution, and event processing. - Improve attribution modeling logic to support better optimization and business decisions. - Ensure high availability, performance, stability, and scalability of real-time data workflows. - Design and support production-grade ML model serving infrastructure. - Develop and manage dbt datasets for model training, experimentation, and Feature Store integration.

2026-03-10 Aplikuj - przejdz do oferty ↗