Senior Site Reliability Engineer (Remote)
⚲ Warsaw, Wrocław, Kraków
26 000 - 34 000 PLN netto (B2B)
Wymagania
- IaC
- CI/CD
- Go
- K8S
- Kubernetes
- Docker Swarm
- Python
Opis stanowiska
What’s in store for you:
You’ll be solving complex challenges and maintaining our own infrastructure with 60PB+ monthly data traffic. Here are its scale and maturity in numbers:
- 6PB+ Ceph storage
- 60PB+ monthly data traffic through our systems
- 300k+ service requests/sec processed
- 500k+ Kafka messages/sec streamed
Your day-to-day:
• Own and evolve Webshare's production infrastructure - lead the migration from Docker Swarm to Kubernetes (or hybrid K8s + Ansible).
• Maintain high availability across hundreds of servers and ~50 services.
• Drive observability in cooperation with the development team.
• Establish and enforce IaC practices, CI/CD pipeline reliability, and change management processes.
• Participate in the on-call rotation alongside backend developers.
• Respond to and lead incident resolution, run post-mortems, and drive systematic remediation.
• Contribute platform tooling that improves developer experience and reduces infrastructure toil.
• Keep backend engineers informed and capable - no silos, shared infrastructure ownership.
Your skills & experiences:
• Have built and operated highly available infrastructure at a comparable scale - hundreds of servers, dozens of services, real production load.
• Hands-on K8s in self-hosted / bare-metal environments.
• Confident with Infrastructure as Code.
• Have owned CI/CD pipelines end-to-end (GitLab CI or equivalent).
• Have been on call in a production environment.
• Proactive - surfaces problems before being asked, keeps the team informed without prompting.Scripting and development skills.
NICE TO HAVE REQUIREMENTS:
• Led at least one major infrastructure migration - planned, executed, and stabilised it.
• Python and/or Go familiarity - backend is Python, edge services are Go.
• Exposure to proxy, networking-heavy infrastructure.
• Previous experience in a small team where developers shared infrastructure responsibility.
• Familiarity with edge clusters or split compute/edge architectures.
Please note that only the selected candidates will be contacted for further steps.
It would be greatly appreciated if you could share your LinkedIn profile together with the application.
Up for the challenge? Let’s talk!
You’ll be solving complex challenges and maintaining our own infrastructure with 60PB+ monthly data traffic. Here are its scale and maturity in numbers:
- 6PB+ Ceph storage
- 60PB+ monthly data traffic through our systems
- 300k+ service requests/sec processed
- 500k+ Kafka messages/sec streamed
Your day-to-day:
• Own and evolve Webshare's production infrastructure - lead the migration from Docker Swarm to Kubernetes (or hybrid K8s + Ansible).
• Maintain high availability across hundreds of servers and ~50 services.
• Drive observability in cooperation with the development team.
• Establish and enforce IaC practices, CI/CD pipeline reliability, and change management processes.
• Participate in the on-call rotation alongside backend developers.
• Respond to and lead incident resolution, run post-mortems, and drive systematic remediation.
• Contribute platform tooling that improves developer experience and reduces infrastructure toil.
• Keep backend engineers informed and capable - no silos, shared infrastructure ownership.
Your skills & experiences:
• Have built and operated highly available infrastructure at a comparable scale - hundreds of servers, dozens of services, real production load.
• Hands-on K8s in self-hosted / bare-metal environments.
• Confident with Infrastructure as Code.
• Have owned CI/CD pipelines end-to-end (GitLab CI or equivalent).
• Have been on call in a production environment.
• Proactive - surfaces problems before being asked, keeps the team informed without prompting.Scripting and development skills.
NICE TO HAVE REQUIREMENTS:
• Led at least one major infrastructure migration - planned, executed, and stabilised it.
• Python and/or Go familiarity - backend is Python, edge services are Go.
• Exposure to proxy, networking-heavy infrastructure.
• Previous experience in a small team where developers shared infrastructure responsibility.
• Familiarity with edge clusters or split compute/edge architectures.
Please note that only the selected candidates will be contacted for further steps.
It would be greatly appreciated if you could share your LinkedIn profile together with the application.
Up for the challenge? Let’s talk!
🔍 Dekoder Ogłoszenia
🔴
Own and evolve Webshare's production infrastructure - lead the migration from Docker Swarm to Kubernetes (or hybrid K8s + Ansible).
Będziesz odpowiedzialny za kluczową i potencjalnie problematyczną migrację, która może być trudna i czasochłonna.
🔴
Maintain high availability across hundreds of servers and ~50 services.
Oznacza to pracę z dużą liczbą elementów, co zwiększa złożoność i potencjalne punkty awarii.
🔴
Drive observability in cooperation with the development team.
Może oznaczać, że obecny poziom observability jest niski i wymaga znaczącej pracy od podstaw.
🔴
Participate in the on-call rotation alongside backend developers.
Oczekuje się, że będziesz brać dyżury, co może oznaczać pracę poza standardowymi godzinami.
🟡
Keep backend engineers informed and capable - no silos, shared infrastructure ownership.
Może sugerować, że dotychczasowa współpraca była utrudniona lub brakowało transparentności.