Senior DevOps / SRE (Platform Reliability Engineer)
emagine Polska
⚲ Lisbon
Wymagania
- Microsoft Platform
- Operations
- Python
- Splunk
- Jenkins
- Cloud
- TCP/IP
- Security
- Microsoft Azure
- CI/CD
Opis stanowiska
We are looking for a Senior DevOps / Site Reliability Engineer (SRE) to ensure the reliability, scalability, performance, and security of our platform and cloud infrastructure. You will play a key role in building and operating cloud-native systems, improving observability, automating operations, implementing SRE best practices (SLOs/SLIs), and supporting development teams to deliver highly available services. Key Responsibilities • Design, implement, and maintain highly available and scalable infrastructure on AWS. • Own and improve the reliability of production systems using SRE principles (SLO, SLI, error budgets). • Build and manage CI/CD pipelines to support fast and safe software delivery. • Develop and maintain Infrastructure as Code (IaC) using Terraform, Ansible, CloudFormation, etc. • Manage and optimize container orchestration platforms (Kubernetes, Docker, Helm). • Implement and maintain monitoring, logging, and alerting solutions (Prometheus, Grafana, ELK, Datadog, Splunk). • Lead incident response, perform root cause analysis, and write postmortems to drive continuous improvement. • Improve system performance, capacity planning, scaling strategies, and disaster recovery processes. • Collaborate closely with development teams to improve deployment strategies and system resilience. • Implement security best practices (IAM, secret management, vulnerability scanning, patching). • Define operational standards, runbooks, documentation, and best practices for platform reliability. • Participate in on-call rotation and provide senior-level support for critical production issues. Key Requirements • 5+ years of experience in DevOps / SRE / Cloud Infrastructure / Platform Engineering. • Strong expertise in Linux systems administration and troubleshooting. • Proven experience with Kubernetes in production environments. • Strong experience with CI/CD tools (GitLab CI, Jenkins, GitHub Actions, Azure DevOps). • Solid knowledge of Infrastructure as Code (Terraform highly preferred). • Experience with cloud platforms: AWS, Azure, or Google Cloud. • Strong understanding of networking fundamentals (TCP/IP, DNS, load balancing, reverse proxies). • Experience with observability tools: monitoring, metrics, logging, tracing. • Strong scripting skills (Bash, Python, or similar). Nice to Have • Experience with additional cloud platforms (Azure, GCP). • Strong understanding of networking fundamentals.