Site Reliability Engineer SRE
Cantor Fitzgerald
⚲ Warsaw
15 000 - 30 000 PLN brutto (UoP)
Wymagania
- CI/CD
- Bash
- Linux / Unix
- Prometheus
- Grafana
- Ansible
- Docker
- Solace PubSub+
- Kubernetes
- Kafka
Opis stanowiska
Company Overview: Cantor Fitzgerald is a leading global financial services firm specializing in investment banking, capital markets, institutional equity and fixed income sales and trading, commercial real estate, and prime brokerage. With a legacy of over 75 years of financial innovation and integrity, Cantor operates across major financial centres worldwide, delivering excellence and trusted expertise to its clients. About the Role We are seeking a skilled and proactive Reliability Engineer to join our Messaging team, responsible for the stability, performance, and scalability of enterprise messaging platforms built on Solace PubSub+ software and appliances. This role focuses on maintaining highly available, low‑latency messaging infrastructure supporting mission‑critical systems across both production and non‑production environments. The successful candidate will play a key role in operational reliability, observability, capacity planning, and continuous improvement, while also gaining exposure to proprietary messaging APIs and platforms. Key Responsibilities • Administer, maintain, and support Solace PubSub+ appliances and software brokers across on‑premises and cloud environments • Provide production support for messaging‑related incidents, including root cause analysis and permanent remediation • Monitor system performance and availability using Prometheus, InfluxDB, and Grafana, proactively identifying and resolving issues • Configure, optimise, and support Solace deployments across WAN environments, ensuring secure, low‑latency message delivery • Collaborate closely with development, application support, and infrastructure teams to troubleshoot message flow and integration issues • Own capacity planning, scaling, and performance tuning of the messaging platform • Automate routine operational tasks and contribute to continuous improvement of reliability processes • Build and maintain monitoring dashboards, alerts, and metrics to provide deep visibility into messaging systems • Produce and maintain high‑quality documentation, including runbooks, topology diagrams, and configuration baselines • Support proprietary messaging APIs and components using C++, Java, Python, and C# • Provide support for proprietary caches and gateways integrating applications with the messaging layer Skills & Experience Required • Minimum 3+ years of hands‑on experience administering Solace PubSub+ messaging systems in an enterprise environment • Strong background in production support, ideally within a 24x7 or high‑availability environment • Solid understanding of distributed systems, WAN networking, latency management, and failover strategies • Proven experience with Prometheus and Grafana for monitoring and alerting • Strong troubleshooting skills related to message delivery, persistence, and topic routing • Experience with capacity management, performance tuning, and scalability of distributed platforms • Good knowledge of Linux/Unix operating systems • Scripting and automation skills using Bash and/or Python • Excellent analytical and problem‑solving skills with strong attention to detail • Clear and effective communicator, comfortable working with multiple technical teams Desirable Skills & Experience • Experience with containerisation technologies such as Docker and Kubernetes • Familiarity with other messaging platforms (Kafka, RabbitMQ, IBM MQ) • Exposure to DevOps practices and CI/CD pipelines • Experience with cloud platforms such as AWS, Azure, or GCP, including cloud‑native Solace deployments Personal Attributes • Highly motivated, proactive, and ownership‑driven • Comfortable working in a high‑availability, mission‑critical environment • Strong collaborator who works well across teams • Methodical, organised, and capable of handling multiple priorities • Curious and eager to learn new systems and technologies • Calm and effective under pressure Why Join Us? • Work on low‑latency, high‑throughput messaging systems supporting mission‑critical trading and enterprise platforms • Join a highly skilled, multi‑disciplinary engineering team • Opportunity to work with a broad and modern technology stack • Further develop both infrastructure reliability and programming skills in a complex environment