JustJoin.IT Praca zdalna Senior

Data Engineer – Data Lake (f/m/x)

Sii

⚲ Białystok, Bydgoszcz, Kraków, Gdańsk, Poznań, Katowice, Warszawa, Wrocław, Piła, Łódź, Lublin, Rzeszów, Szczecin, Toruń

Wymagania

  • Docker
  • ETL/ELT
  • Kubernetes
  • Amazon AWS
  • Python
  • Apache Spark

Opis stanowiska

You will join an international project within the healthcare and life sciences industry, focused on building and evolving a modern Data Lake platform supporting large-scale data processing and analytics. The solution enables data-driven decision-making in a highly regulated environment, with a strong emphasis on data quality, security, and compliance. The environment is cloud-based and leverages modern big data technologies and best engineering practices. As a Data Engineer, you will be responsible for designing, developing, and maintaining data pipelines and Data Lake architecture. You will work closely with cross-functional teams, including data scientists and business stakeholders, to deliver reliable and efficient data solutions. Your tasks • Designing and developing scalable data pipelines for batch and real-time data processing • Building and optimizing Data Lake architecture for analytical use cases • Integrating multiple data sources and ensuring seamless data flow across systems • Ensuring data quality, consistency, and governance (data lineage, access control) • Optimizing storage and processing performance using modern data formats and partitioning strategies • Monitoring, troubleshooting, and improving data pipeline performance • Collaborating with stakeholders to translate business needs into technical solutions • Following best practices in data engineering and continuously improving the platform Requirements • Strong experience in Data Engineering or Big Data-related roles • Proficiency in Python, Scala, or Java • Hands-on experience with tools such as Apache Spark, PySpark, or similar frameworks • Previous work with Data Lake technologies (e.g., AWS S3, Azure Data Lake, Databricks, BigQuery) • Knowledge of ETL/ELT processes and orchestration tools (e.g., Airflow, Data Factory) • Good understanding of SQL and data modeling • Experience with distributed systems and large-scale data processing • Familiarity with Docker and Kubernetes • Strong analytical and problem-solving skills • Fluent in Polish required • Residing in Poland required Nice-to-have requirements • Experience with streaming technologies (e.g., Kafka) • Knowledge of data governance tools • Familiarity with CI/CD processes in data projects