JustJoin.IT Praca zdalna Senior ↻

Site Reliability Engineer

N-iX

⚲ Kraków

34 - 39 USD/h netto (B2B)

Wymagania

STACKIT
PostgreSQL
pgBouncer
Helm
Kubernetes
Python

Opis stanowiska

(#4479) We are looking for an experienced Site Reliability Engineer to ensure the stability, scalability, and operational excellence of a Kubernetes-based platform running in a hybrid environment. The project is entering a pivotal phase, with a major go-live planned for mid-February and a target audience of 75,000 users. User onboarding is already underway, with over 5,000 users connected and 15,000–20,000 expected to be active by year-end. While the system is stable, we anticipate increased activity and new challenges in January, February, and after the go-live—making this an exciting opportunity to make a real impact. The role focuses on performance optimization, scaling strategies, observability, and reliability engineering. Required Skills: • 4+ years of experience as SRE / DevOps Engineer • Strong hands-on experience with Kubernetes in production • Experience working with hybrid infrastructure (on-prem + cloud) • Solid knowledge of PostgreSQL performance tuning and scaling • Experience with Qdrant or other vector databases • Experience with CI/CD workflows, Helm, Kubernetes autoscaling, and resource optimization • Familiarity with observability stacks (Prometheus, Grafana, ELK/Loki) • Understanding of performance engineering and load testing • Experience with Linux systems and networking • Strong troubleshooting and incident-management skills • Strong Python skills; Rust exposure is a plus • Strong experience with infrastructure as code (Terraform) Nice to Have: • Experience with STACKIT or other sovereign clouds • Experience with PgBouncer • Knowledge of SRE practices (SLO/SLI) • Experience in regulated or public-sector environments • German language skills Responsibilities: • Operate and optimize hybrid infrastructure (on-prem & STACKIT) • Manage and scale Kubernetes clusters • Optimize Helm charts, resource usage, and autoscaling • Conduct performance, load, and stress testing • Ensure reliability, availability, and monitoring of production systems • Tune and operate PostgreSQL • Operate and optimize vector databases (e.g. Qdrant) • Implement monitoring, logging, and alerting • Support incident response and capacity planning We offer: • Flexible working format - remote, office-based or flexible • A competitive salary and good compensation package • Personalized career growth • Professional development tools (mentorship program, tech talks and trainings, centers of excellence, and more) • Active tech communities with regular knowledge sharing • Education reimbursement • Memorable anniversary presents • Corporate events and team buildings • Other location-specific benefits

2026-03-20 Aplikuj - przejdz do oferty ↗