Senior Site Reliability Engineer
23k - 29.2k PLN23 000 - 29 200 PLN/ mies.UoP
SeniorFull-time·Umowa o pracę
#377065·Dodano dziś·0
Źródło: SOLID.JobsTech Stack / Keywords
PrometheusGrafanaElastic StackAnsibleKubernetesPythonAzure Kubernetes ServiceELK StackTempoThanosJaegerIncident Management
Firma i stanowisko
XTB is a Polish brokerage house operating globally since 2005, offering access to thousands of financial instruments such as CFDs on currencies, commodities, stock indices, cryptocurrencies, as well as stocks and ETFs listed on major global exchanges. It holds a brokerage license issued by the Polish Financial Supervision Authority and is one of the world's largest FX and CFD brokers listed on the stock exchange. The company is distinguished by its innovative and award-winning xStation platform, fast and professional customer service, and a rich educational package with online courses for investors at every level.
Wymagania
- At least 5 years of professional experience in SRE, Infrastructure, or DevOps roles managing high-scale, distributed environments.
- Advanced programming skills in Python, focusing on scalable automation, internal tooling, and robust scripts.
- Hands-on expertise in managing production-grade Kubernetes environments, configuration management with Ansible, and designing resilient infrastructure architectures within Azure Kubernetes Service and on-prem environments.
- Deep proficiency in building standardized telemetry ecosystems with self-hosted open-source tools: Prometheus, Grafana, ELK Stack, Tempo, Thanos, Jaeger, and similar.
- Ability to drive incident management, conduct post-incident analysis, and foster a culture of reliability and shared ownership.
- Ability to leverage AI/ML techniques for SRE tasks such as AIOps, automated anomaly detection, log analysis, and optimizing reliability workflows.
Nice to have:
- Experience with commercial observability and APM solutions (e.g., Datadog, Splunk, New Relic) or chaos engineering frameworks.
Obowiązki
- Develop a standardized observability ecosystem and implement a conscious telemetry model focusing on structured events, distributed tracing, and intelligent sampling strategies.
- Act as a strategic partner to product engineering teams, providing the platform, standards, and data for service reliability ownership; use error budgets and alerting to balance feature velocity with stability.
- Enhance detection capabilities with early-warning systems and AI/ML for automated anomaly detection and intelligent data analysis to strengthen system resilience.
- Build internal automation and tooling that streamlines SRE workflows, automates routine tasks, and enhances efficiency across the technology stack.
- Participate in an on-call rotation for incident management, ensuring rapid resolution, effective communication, and post-incident analysis for continuous improvement.
Benefity
- Salary range: 23,000–29,200 PLN gross per month (Employment contract)
- Full remote work
- On-call duty (100%)
- Training budget
- Conference trips
- Language classes
- Medical package
- Insurance
Dofinansowanie szkoleń
Budżet konferencyjny
Kursy językowe
Opieka zdrowotna
Ubezpieczenie
XTB
42 aktywne oferty