JustJoin.IT Praca zdalna Mid New

Data/MLOps Engineer – CT&C

Upvanta sp. z o.o.

⚲ Wrocław, Warszawa, Gdańsk, Poznań, Kraków

25 200 - 27 300 PLN netto (B2B)

Wymagania

  • Python
  • Apache Spark
  • PySpark
  • MLOps
  • AWS
  • Amazon SageMaker
  • aws lambda
  • AWS CDK
  • SQL
  • ETL

Opis stanowiska

We are looking for an experienced and passionate Data/MLOps Engineer to join our CT&C Engineering team. In this role, you will bridge the gap between Data Science and Production Engineering, ensuring that machine learning solutions are scalable, reliable, secure, and production-ready.
You will play a key role in designing, building, maintaining, and optimizing our data platforms and ML infrastructure, enabling efficient data ingestion, transformation, storage, model deployment, and real-time analytics.
This position requires a strong understanding of machine learning concepts, hands-on MLOps expertise, and solid engineering skills across cloud platforms, data processing frameworks, and automation tooling.
Key Responsibilities
• ML & Data Infrastructure
• Deploy, maintain, and optimize end-to-end machine learning lifecycles, including automated training, deployment, monitoring, and versioning.
• Build and support core MLOps capabilities such as Feature Stores, Experiment Tracking platforms, and Model Registries.
• Provision and manage scalable cloud infrastructure using Infrastructure as Code (IaC) solutions such as Terraform or AWS CloudFormation.
• Design and implement robust CI/CD/CT (Continuous Training) pipelines to enable reliable and repeatable production releases.
• Collaborate closely with Data Scientists to productionize machine learning models and workflows.
• Data Engineering & Pipeline Optimization
• Design and develop high-volume data ingestion and processing pipelines using Apache Spark, PySpark, and Python.
• Build scalable ETL/ELT solutions supporting advanced analytics and machine learning workloads.
• Implement optimized data models and storage strategies to support low-latency model inference and high-performance analytics.
• Integrate automated data quality validation, monitoring, and observability capabilities across data platforms.
• Governance, Monitoring & Security
• Implement proactive monitoring for model performance, model drift, data quality issues, and system latency.
• Ensure complete reproducibility through robust versioning of data, code, models, and artifacts.
• Apply security best practices across the ML lifecycle, including access management, data privacy, and compliance requirements.
• Support operational excellence through incident management, troubleshooting, and continuous improvement initiatives.
• Agile Delivery & Collaboration
• Work within Agile delivery teams, participating in sprint planning, backlog refinement, daily stand-ups, and retrospectives.
• Translate business and data science requirements into scalable technical solutions.
• Collaborate with Product Owners, Data Scientists, Data Engineers, and Platform Teams to deliver production-grade ML solutions.
• Create and maintain technical documentation covering architecture, workflows, pipelines, and operational procedures.
What We're Looking For:
• Strong Python development experience
• Hands-on experience with Apache Spark and PySpark
• Solid understanding of machine learning lifecycle management and MLOps best practices
• Experience with AWS services, particularly:
• Amazon SageMaker
• AWS Lambda
• AWS CDK
• Experience building CI/CD pipelines for data and ML workloads
• Strong SQL skills
• Experience designing and implementing ETL/ELT pipelines
• Knowledge of PyTorch and machine learning frameworks
• Experience with Infrastructure as Code (Terraform and/or CloudFormation)
• Understanding of monitoring, observability, and production support practices
• Experience working in Agile environments
• Design and implement scalable ML solutions using PySpark and Amazon SageMaker.
• Balance software engineering best practices with practical machine learning implementation.
• Drive operational excellence across the entire ML lifecycle.
• Experience with Feature Stores and Model Registry platforms
• Experience implementing Continuous Training (CT) pipelines
• Knowledge of MLOps governance frameworks
• Experience with real-time streaming architectures
• Exposure to large-scale cloud-native data platforms

🔍 Dekoder Ogłoszenia

🔴
bridge the gap between Data Science and Production Engineering
Prawdopodobnie będziesz musiał tłumaczyć potrzeby Data Scientistów na język inżynierów i odwrotnie, często rozwiązując problemy wynikające z braku ścisłej współpracy między tymi zespołami.
🔴
ensuring that machine learning solutions are scalable, reliable, secure, and production-ready
Oznacza to, że będziesz odpowiedzialny za wszystkie aspekty techniczne wdrożenia modeli, od infrastruktury po monitorowanie, często w warunkach ograniczonej dokumentacji lub wsparcia.
🔴
play a key role in designing, building, maintaining, and optimizing our data platforms and ML infrastructure
Może to oznaczać, że będziesz budować wszystko od zera lub pracować z istniejącą, potencjalnie niedoskonałą infrastrukturą, wymagając dużej samodzielności.
🔴
Collaborate closely with Data Scientists to productionize machine learning models and workflows
Twoja praca będzie silnie zależna od jakości i gotowości modeli dostarczanych przez Data Scientistów, co może prowadzić do frustracji, jeśli modele nie są przygotowane do produkcji.
🟡
high-volume data ingestion and processing pip
Praca z dużymi wolumenami danych może oznaczać wyzwania związane z wydajnością, kosztami i złożonością infrastruktury.