IT - Site Reliability Engineer
emagine Polska
⚲ Pune
Wymagania
- Incident management
- Configuration management
- Configuration Management (ITIL)
- Operations
- Python
- Cloud
- Powershell
- Security
- Microsoft Azure
- CI/CD
Opis stanowiska
Introduction & Summary We are seeking a dedicated Site Reliability Engineer (SRE) to join our team. The ideal candidate will possess a strong technical background and operational excellence in ensuring the reliability, availability, and performance of critical systems. You will play a key role in monitoring, troubleshooting, and resolving issues, while leveraging your expertise in observability for robust incident management. Main Responsibilities Your core duties will include: • Monitoring production systems and services using observability tools. • Responding to incidents, alerts, and outages in real time. • Participating in a rotating on-call schedule. • Designing, implementing, and maintaining observability solutions. • Collaborating with development and infrastructure teams to ensure system reliability. • Automating operational tasks and documenting procedures. • Conducting post-incident reviews and proposing monitoring enhancements. Key Requirements • Bachelor's degree in Information Technology, Computer Science or related field. • 2-5 years of experience in cloud and operations engineering. • Proficiency with Azure services; AWS and GCP experience is a plus. • Hands-on experience with Infrastructure-as-Code (IaC) tools like Terraform. • Strong scripting skills in Python, Bash or PowerShell. • Familiarity with Gitlab CI/CD tools integrated with Azure. • Proficiency in monitoring and logging tools. Nice to Have • Master's degree or relevant certifications. Other Details This position involves a 24/7 shift rotation, ensuring continuous system reliability and performance. The role emphasizes proactive monitoring and efficient incident response in a collaborative environment.