Senior Engineer AWS AI & MLOps
Aptiv
⚲ Kraków, Podgórze
Wymagania
- AWS
- Python
- Terraform
- Amazon EKS
- Kubernetes
- Apache Airflow
- MLflow
- GitHub
- Ray
Opis stanowiska
Nasze wymagania: Technical Skills • Strong experience with AWS cloud architecture for ML workloads • Hands-on expertise in: o Multi-GPU training (Ray or equivalent distributed frameworks) o EKS / Kubernetes o Infrastructure as Code (Terraform) o Airflow, MLflow • Proficient Python programming for ML and platform automation • Experience building and operating CI/CD pipelines (GitHub-based) Machine Learning Competence • Deep understanding of ML training pipelines, including: o Data ingestion and preprocessing o Data quality assurance o Train/test validation strategies • Experience supporting large-scale ML experimentation and productionization Automotive & Algorithmic Understanding • Solid understanding of: o Perception systems o Behavioral / decision-making algorithms • Prior experience in automotive, ADAS, or autonomous driving environments is required • Familiarity with constraints of safety-critical and real-time systems O projekcie: AI & MLOps Architect – Autonomous Driving We are seeking an AI & MLOps Architect to design, build, and scale robust, production-grade MLOps infrastructure for L2++ autonomous driving systems operating in complex urban environments. You will be responsible for end-to-end ML platform architecture on AWS, enabling scalable training, validation, deployment, and observability of perception and behavioral models that meet automotive-grade reliability, safety, and performance standards. This role sits at the intersection of machine learning engineering, cloud architecture, and automotive AI systems, and requires deep technical leadership across ML training pipelines, infrastructure automation, and multi-region scalability. Zakres obowiązków: MLOps & Cloud Architecture • Design and own end-to-end MLOps architecture on AWS for autonomous driving workloads • Architect multi-zone, highly available ML platforms supporting urban L2++ hands-off use cases • Build and operate scalable multi-GPU training environments using Ray clusters • Define infrastructure standards for compute management, networking, storage, and security AWS Platform & Infrastructure • Implement and manage: o Amazon EKS / Kubernetes (K8s) for ML workloads o VPC architecture, subnets, routing, and network isolation o S3 Intelligent-Tiering for cost-efficient storage of large-scale sensor and training data o AWS Lambda for event-driven ML workflows and automation o AWS IoT infrastructure provisioned via Terraform • Ensure strong multi-zone resilience, fault tolerance, and disaster recovery strategies MLOps Pipelines & Tooling • Design and operate ML pipelines using: o Apache Airflow for orchestration o MLflow for experiment tracking, model versioning, and lifecycle management • Implement CI/CD pipelines for ML and infrastructure using GitHub • Enable reproducible, traceable, and auditable ML workflows aligned with automotive standards Machine Learning Engineering • Enable scalable data ingestion and processing pipelines for sensor-rich datasets • Establish data quality checks, validation frameworks, and train/test split governance • Support ML teams with optimized workflows for training, evaluation, and deployment • Collaborate on best practices for training at scale, including performance tuning and cost optimization Algorithmic & Domain Collaboration • Work closely with ML researchers and engineers on: o Perception algorithms (vision, sensor fusion, object detection, tracking) o Behavioral and decision-making algorithms • Translate algorithmic requirements into production-ready infrastructure • Apply automotive domain knowledge to ensure platform suitability for safety-critical systems Observability, Scalability & Operations • Build strong monitoring, logging, and observability for ML systems and infrastructure • Enable performance metrics, failure detection, and operational insights across the ML lifecycle • Continuously improve platform scalability, reliability, and operational efficiency Oferujemy: Private health care (Signal Iduna) and Life insurance for you and your beloved ones Well-Being Program that includes regular webinars, workshops, and networking events Hybrid work (min. 47 days/yr of remote work, flexible working hours) Employee Pension Plan paid by the employer (you get + 3,5% on each gross salary) Access to sports groups and Multisport card