JustJoin.IT Stacjonarnie Mid New

AI Infrastructure Nutanix Site Reliability Engineer

emagine Polska

⚲ Riyadh

Wymagania

  • automation
  • Amazon Web Services (AWS)
  • Machine Learning (ML)
  • Artificial Intelligence (AI)
  • Docker
  • Cloud
  • Grafana
  • Microsoft Azure
  • DevOps
  • CI/CD

Opis stanowiska

Job Title: AI Infrastructure Nutanix Site Reliability Engineer Location: Saudi Arabia Nationality: Saudi Nationals only Experience: 5+ yearsJob Overview: We are seeking an experienced AI Infrastructure Site Reliability Engineer to support and optimize large-scale, distributed systems for a leading global technology client. The role focuses on ensuring high availability, scalability, and performance of AI-driven infrastructure in a Nutanix-based environment. End-to-end infrastructure management including hardware provisioning, firmware, OS, networking, storage, GPU tuning, and monitoring. Main Responsibilities: • Manage and maintain AI infrastructure on Nutanix platforms. • Ensure system reliability, uptime, and performance through monitoring and automation. • Troubleshoot infrastructure, network, and application issues. • Implement CI/CD pipelines and infrastructure-as-code practices. • Collaborate with engineering teams to improve system resilience and scalability. • Optimize cloud and on-prem environments for AI/ML workloads. Key Requirements: • 5+ years of experience in Site Reliability Engineering, DevOps, or Infrastructure roles. • Strong experience with Nutanix (AHV, AOS, Prism). • Strong understanding of hardware, OS, networking, and storage systems • Experience with GPU environments and performance tuning • Knowledge of cloud platforms (AWS, Azure, or GCP). • Experience with containerization (Docker, Kubernetes). • Proficiency in scripting (Python, Bash, or similar). • Familiarity with monitoring tools (Prometheus, Grafana, etc.). • Understanding of AI/ML infrastructure is a plus. Other Details: This position is based in Saudi Arabia and is open to Saudi nationals. The role demands significant expertise in AI infrastructure management and a commitment to enhancing performance and reliability standards.