Data Engineer
Pretius
⚲ Warszawa
140 - 170 PLN/dzień netto (B2B)
Wymagania
- PostreSQL
- AWS
- Airflow
- CI/CD
- ETL/ELT
- Azure
- BigQuery
- SQL
- Python
- Apache Spark
Opis stanowiska
At Pretius, we are looking for Data Engineer to a project for global-scale platform in the field of gaming and lotteries. Project / Role • Design, build, and maintain scalable, production-grade data pipelines using Python (ETL/ELT) and orchestration tools • Write and optimize advanced SQL queries for efficient data extraction, transformation, and performance tuning • Design and implement scalable data models (star/snowflake schema) for analytics and reporting • Build and maintain end-to-end data warehouse solutions, including batch and near-real-time ingestion, data marts, and semantic layers • Work with Apache Spark (PySpark, Spark SQL) for large-scale data processing and analytics • Develop and operate cloud data solutions across AWS, Azure, and/or GCP (e.g., S3, Glue, EMR, Redshift, ADLS, Data Factory, Synapse, BigQuery) • Design scalable, secure, and cost-efficient data architectures with FinOps awareness • Build and maintain reliable data pipelines using orchestration tools (Airflow, ADF, Prefect, Dagster) with proper scheduling, retries, and monitoring • Ensure data reliability through validation, monitoring, idempotent design, and failure recovery mechanisms • Develop streaming and real-time data pipelines using Kafka, Kinesis, Pub/Sub, or Event Hubs where required • Implement data quality, governance, and security standards (PII protection, encryption, RBAC, data lineage) • Apply DevOps practices including Git, CI/CD, Infrastructure as Code, and production monitoring • Integrate external APIs and SaaS data sources into data platforms Requirements • 8+ years of experience in data engineering, analytics engineering, or similar data-focused roles • Expert-level proficiency in Python for data processing, pipeline development, and automation • Advanced SQL skills, including query optimization and complex analytical transformations • Strong experience with relational and analytical databases (e.g., PostgreSQL, Snowflake, BigQuery, Redshift, Synapse) • Hands-on experience designing and implementing data warehouse architectures (ETL/ELT, batch, near-real-time) • Proven experience with big data processing frameworks such as Apache Spark (PySpark, Spark SQL) • Strong cloud experience across AWS, Azure, and/or GCP, including core data services • Experience building and operating scalable data pipelines using orchestration tools (Airflow, ADF, Prefect, Dagster) • Understanding of distributed systems principles and large-scale data processing challenges • Strong knowledge of data quality, governance, security, and compliance best practices • Experience with DevOps practices, including CI/CD, Git, and Infrastructure as Code (Terraform or equivalent) • Ability to design scalable, production-grade data solutions in complex enterprise environments Nice to have: • Familiarity with streaming technologies (Kafka, Kinesis, Pub/Sub) • Experience with dbt and BI tools (Power BI, Tableau, Looker) What do we offer? • We focus on long-term relationships based on fair principles and reliability • Co-financing of the Multisport card and Medicover private healthcare • Modern office available • Team bonding activities, internal courses, conferences, certifications