Senior Site Reliability Engineer
⚲ Warszawa
28 560 - 38 640 PLN (B2B)
Wymagania
- SRE
- DevOps
- AWS Cloud Services
- IaC
- Docker
- CI/CD Pipelines
- GitHub Actions
- PostgreSQL
- Amazon RDS
- SQL
- VPC
- DNS
- Troubleshooting
- dig
- traceroute
- UNIX/Linux
- Prometheus
- Grafana
- Datadog
- Dynatrace
- Automation
- AI
- Problem-Solving
- Incident management skills
Opis stanowiska
O projekcie: We are looking for an experienced Site Reliability Engineer to ensure the reliability, scalability, and performance of large-scale cloud-based web applications. You will work closely with software development, cloud operations, and platform teams to build and maintain resilient infrastructure and improve system stability. Wymagania: - 5+ years of experience in SRE, DevOps, or similar roles- Strong experience with AWS cloud services and Infrastructure-as-Code tools- Hands-on experience with Kubernetes and containerized environments- Proficiency in Docker and CI/CD pipelines (e.g., GitHub Actions)- Solid understanding of databases (e.g., PostgreSQL, Amazon RDS) and SQL- Knowledge of networking concepts (VPC, DNS, troubleshooting tools like dig/traceroute)- Strong Linux/Unix administration skills- Experience with observability tools (e.g., Prometheus, Grafana, Datadog, Dynatrace)- Familiarity with automation and AI-based solutions in infrastructure- Strong problem-solving and incident management skills Codzienne zadania: - Design and maintain monitoring, alerting, and incident response systems to ensure high availability - Collaborate closely with engineering, product, and architecture teams - Build and manage cloud infrastructure using Infrastructure-as-Code (e.g., Terraform, Pulumi) on AWS - Operate and optimize Kubernetes environments (e.g., EKS) - Develop and maintain containerized applications using Docker - Improve CI/CD pipelines and drive automation across deployment processes - Implement and manage observability tools (logging, metrics, tracing) - Participate in incident management, postmortems, and reliability improvements - Support capacity planning, disaster recovery, and system scaling - Contribute to security, compliance, and operational best practices - Develop automation and AI-driven solutions for monitoring and incident prevention