Senior AI/ML Platform Engineer, Cloud Architecture
Instructure Hungary Ltd
⚲ Budapest
20 811 - 25 436 PLN (PERMANENT)
Wymagania
- AWS architecture
- production infrastructure
- system design
- distributed systems
- CI/CD and deployment patterns
- containerization and orchestration
- reliability and observability
- infrastructure-as-code
- production ML or AI service deployment
Opis stanowiska
O projekcie: Instructure is building foundational AI and machine learning capabilities that will power the next generation of learning experiences across our product ecosystem. We are looking for a Senior AI/ML Platform Architect / Engineer to design and build the AWS-native infrastructure layer that enables data scientists, ML engineers, and applied AI teams to move from prototype to production safely, reliably, and at scale. This is not a research role and not a traditional DevOps role. It is a hands-on systems architecture role for someone who understands cloud infrastructure, production reliability, and the unique needs of AI/ML workloads. You will partner closely with data science, applied AI, backend engineering, platform engineering, and product teams to translate emerging AI/ML needs into scalable, modular, production-grade systems. Wymagania: What We’re Looking For- Strong experience designing and operating production systems on AWS.- Deep understanding of distributed systems, cloud architecture, scalability, reliability, and service design.- Hands-on experience with infrastructure-as-code, CI/CD, Docker, Kubernetes, and production deployment workflows.- Experience building or supporting production ML, AI, data, or high-scale backend systems.- Strong system design skills, including the ability to reason about tradeoffs, failure modes, data flow, service boundaries, and operational complexity.- Ability to communicate clearly across data science, ML engineering, backend engineering, platform engineering, product, and leadership stakeholders. Nice to Have- Experience with SageMaker, Bedrock, ECS, EKS, Lambda, S3, RDS, OpenSearch, Aurora, EventBridge, Step Functions, or related AWS services.- Experience with model serving, batch inference, embedding pipelines, vector databases, RAG systems, or LLM-backed applications.- Experience building ML platform capabilities such as model registries, experiment tracking, evaluation pipelines, inference services, or model monitoring.- Experience supporting both real-time and batch AI/ML workloads.- Experience with workflow orchestration, data pipelines, and production evaluation frameworks.- Experience defining production-readiness standards for AI systems, including evaluation gates, model/version drift, data quality checks, and cost monitoring. You Might Be a Great Fit If- You enjoy designing systems from first principles and can explain architecture tradeoffs clearly.- You have built infrastructure that other engineers depend on.- You are comfortable operating at both architecture and implementation levels.- You understand that ML systems involve data, models, evaluation, versioning, latency, uncertainty, and operational risk.- You can take an ambiguous AI/ML need and turn it into a practical technical architecture.- You care deeply about scale, reliability, modularity, maintainability, and developer experience. What Success Looks Like- You understand the AI/ML team’s workflows, infrastructure gaps, and production bottlenecks.- You define reusable architecture patterns that help AI/ML services move from prototype to production.- You establish reliable deployment, monitoring, rollback, and operational standards for AI/ML systems.- You reduce friction for data scientists and applied AI engineers by creating clear production pathways.- You help teams build AI/ML systems that are scalable, secure, observable, and maintainable.- Your work becomes part of the foundation for scaling AI capabilities across Instructure products. Onsite Collaboration Requirement: This role requires working onsite on Tuesday and Wednesday, with Thursday strongly encouraged as part of our company’s in-person collaboration model. Codzienne zadania: - Design and build the AWS-native production infrastructure for AI/ML services, including deployment, observability, reliability, and operational readiness. - Partner with ML, data science, and applied AI teams to understand their workflows and translate prototype needs into scalable architecture. - Create reusable infrastructure patterns, service templates, CI/CD pipelines, and deployment workflows for AI/ML workloads. - Architect systems for model serving, batch inference, retrieval pipelines, evaluation workflows, and AI service deployment. - Define production standards for monitoring, alerting, rollback, logging, versioning, and reliability of AI/ML systems. - Collaborate with platform and backend engineering teams to ensure AI/ML infrastructure aligns with broader company architecture and security standards.