Production Systems Engineer – Mass Recovery
ITDS
⚲ Krakow
16 800 - 21 840 PLN netto (B2B)
Wymagania
- Incident management
- Data analysis
- Site Reliability Engineering
- Networking Fundamentals
- Infrastructure Engineering
- Cloud Computing
- Observability Tools (AppDynamics, Splunk)
- CMDB Platforms (ServiceNow)
- Virtualization Platforms (ESX)
- Disaster Recovery Planning
Opis stanowiska
Unleash resilience and shape the future of disaster recovery — drive enterprise-wide mass outage response and infrastructure robustness! Krakow-based opportunity with hybrid work model. As a Production Systems Engineer – Mass Recovery, you will be working for a leading financial institution committed to safeguarding the stability of the global financial system. You will help design and implement advanced IT resilience strategies, ensuring rapid, effective recovery from major incidents affecting critical services. This is your chance to be at the forefront of innovative disaster recovery solutions, making tangible impact in a dynamic banking environment. Your main responsibilities: • Develop and maintain detailed service dependency models across applications, platforms, and infrastructure layers to support disaster recovery efforts. • Identify, document, and analyze shared failure domains such as virtualization, storage, and network components. • Define scenario-based blast radius models to anticipate and mitigate mass outage impacts. • Support rapid failure correlation by analyzing service failures and providing actionable insights for recovery teams. • Validate and challenge existing resilience data sources, ensuring alignment with real system behaviors. • Document gaps in resilience, including RTO mismatches and missing recovery pathways, to enhance recovery strategies. • Collaborate with cross-functional teams and tooling platforms to extract and synthesize relevant operational data. • Contribute to designing fault-tolerant architectures and recovery procedures for high-availability systems. You're ideal for this role if you have: • Minimum of 4 years’ experience in production engineering, site reliability engineering, or infrastructure engineering within large-scale environments. • Strong knowledge of virtualization platforms (ESX), cloud providers, and storage/big data systems. • Solid understanding of networking fundamentals and infrastructure topology. • Hands-on experience working with CMDB platforms (like ServiceNow), observability tools (such as AppDynamics, Splunk). • Proven ability to analyze complex data sets, identify patterns, and derive practical insights. • Experience operating under high-pressure incident management scenarios. • Excellent communication skills in English, fluent command required. It is a strong plus if you have: • Previous experience within banking or financial services, especially with HSBC or similar institutions. • Exposure to Disaster Recovery or Mass Recovery planning/execution. • Data manipulation and extraction skills. • Familiarity with Jira/Confluence and large distributed system environments. Eligibility for the role: • Only candidates with an existing legal right to work in Europe will be considered for this role. #MAKEYourCareerBETTER Interested? Apply now and include your CV (preferably in English) along with a statement confirming your consent to the processing and storage of your personal data.