NoFluffJobs Praca zdalna Mid

[Remote] Data Engineer GCP Airflow BigQuery

DS STREAM

⚲ Warsaw, Katowice, Kraków, Wrocław, Poznań

20 160 - 23 520 PLN (B2B)

Wymagania

  • Google Cloud Platform
  • ETL
  • Airflow
  • Python
  • SQL
  • Spark
  • DevOps
  • CI/CD
  • Kubernetes (nice to have)
  • MLOps (nice to have)
  • AI (nice to have)
  • Scala (nice to have)
  • REST API (nice to have)
  • Terraform (nice to have)
  • VertexAI (nice to have)

Opis stanowiska

O projekcie: Data Engineer We’re looking for an experienced Data Engineer to build, scale, optimize, and maintain reliable data platforms. You’ll work on real-time pipelines, ML infrastructure, attribution data processing, and cloud-native solutions in a high-volume production environment. Tech Stack - GCP - Apache Spark, dbt, BigQuery - Python, SQL, Scala - Apache Airflow - Terraform, GitHub Actions, Docker, Kubernetes - VertexAI Wymagania: Must Have - Over 3 years of experience in Data Engineering and building production-scale data platforms. - Strong programming skills in Python and SQL. - Advanced experience with data modeling, ETL development, and multiple data formats. - Strong knowledge of Google Cloud Platform and cloud-native architecture design. - Expert knowledge of Apache Airflow for workflow orchestration and automation. - Hands-on experience with Apache Spark for batch and high-volume streaming workloads. - Good understanding of CI/CD and DevOps tools such as GitHub Actions or Kubernetes. Nice to Have - Experience with MLOps and ML deployment in production, preferably with Vertex AI. - Practical experience with Terraform for Infrastructure as Code. - Experience building scalable REST APIs for data or ML services. - Strong testing practices, including TDD and unit/integration tests for Spark and Airflow. - Knowledge of Scala for high-performance data processing. Codzienne zadania: - Build and maintain large-scale advertising data pipelines for clicks, impressions, attribution, and event processing.  - Improve attribution modeling logic to support better optimization and business decisions.  - Ensure high availability, performance, stability, and scalability of real-time data workflows.  - Design and support production-grade ML model serving infrastructure. - Develop and manage dbt datasets for model training, experimentation, and Feature Store integration.