Bulldogjob Stacjonarnie Senior

Solution Architect – Site Reliability (SRE) & Observability | f/m/d

ERGO Technology & Services S.A.

⚲ Warsaw, Gdansk

Wymagania

  • Terraform
  • Kubernetes
  • Azure

Opis stanowiska

As a Solution Architect, you will be responsible for defining the strategic direction of the Site Reliability Engineering (SRE) service including observability and monitoring. This role focuses on architectural decisions, designing integrations, ensuring best practices, and advising SRE engineers and consulting customer teams on how to automate their service operations and leverage observability tools (e.g. Datadog) effectively. How you will get the job done - defining the strategic vision for site reliability engineering, observability and platform engineering and planning tactical steps for implementation - leading the design and governance of automated service operations, observability tooling, ensuring scalability, security, and cost efficiency - scouting and analysing new observability features – matching them to business needs and notifying the engineers about potential improvements - designing collaboration, automation and integration models - defining standards/best practices for automated service operations, observability framework including alerting, SLOs, and distributed tracing across digital products - configuring, integrating, administering, and maintaining observability for all relevant digital products, using Infrastructure as Code (IaC) - ensuring comprehensive monitoring coverage across digital products - supporting, advising, and coaching SRE engineers on the best ways to automate service operations, and the use observability tools - supporting SRE engineers in troubleshooting and optimizing monitoring configurations - guiding and mentoring engineers in implementing provisioning and configuration of observability tools using Infrastructure as Code - engaging with the observability tool vendors to discuss complex technical issues and feature enhancements - answering technical questions from product teams - negotiating technical aspects of observability tools during procurement discussions to ensure optimal setup

🔍 Dekoder Ogłoszenia

🔴
defining the strategic direction of the Site Reliability Engineering (SRE) service including observability and monitoring
Chociaż brzmi to jak strategiczne planowanie, może oznaczać tworzenie strategii od zera w obszarze, który nie jest jeszcze w pełni rozwinięty lub jest chaotyczny.
🔴
advising SRE engineers and consulting customer teams on how to automate their service operations and leverage observability tools
Może oznaczać, że będziesz musiał przekonywać i szkolić zespoły, które nie są jeszcze gotowe na adopcję nowych narzędzi i procesów, co może być trudne.
🔴
leading the design and governance of automated service operations, observability tooling, ensuring scalability, security, and cost efficiency
Określenie 'governance' może sugerować dużą ilość biurokracji i procesów, które trzeba będzie przestrzegać lub tworzyć, co może spowolnić implementację.
🔴
scouting and analysing new observability features – matching them to business needs and notifying the engineers about potential improvements
Może oznaczać, że będziesz odpowiedzialny za research i rekomendacje, ale faktyczna implementacja i decyzje będą należeć do innych, co może prowadzić do frustracji.
🔴
supporting, advising, and coaching SRE engineers on the best ways to automate service operations, and the use observability tools
Chociaż brzmi to jak wsparcie, może oznaczać, że będziesz musiał rozwiązywać problemy i uczyć innych, którzy mogą nie mieć odpowiednich umiejętności lub chęci do nauki.