Senior Site Reliability Engineer

23k - 29.2k PLN23 000 - 29 200 PLN/ mies./ mies.UoPUmowa o pracę (brutto)

SeniorFull-time·Umowa o pracę

#377065·Dodano 20 dni temu·2

Źródło: SOLID.Jobs

Aplikuj teraz

Tech Stack / Keywords

PrometheusGrafanaElastic StackAnsibleKubernetesPythonAzure Kubernetes ServiceELK StackTempoThanosJaegerIncident Management

Firma i stanowisko

XTB is a Polish brokerage house operating globally since 2005, offering access to thousands of financial instruments such as CFDs on currencies, commodities, stock indices, cryptocurrencies, as well as stocks and ETFs listed on major global exchanges. It holds a brokerage license issued by the Polish Financial Supervision Authority and is one of the world's largest FX and CFD brokers listed on the stock exchange. The company is distinguished by its innovative and award-winning xStation platform, fast and professional customer service, and a rich educational package with online courses for investors at every level.

Wymagania

At least 5 years of professional experience in SRE, Infrastructure, or DevOps roles managing high-scale, distributed environments.
Advanced programming skills in Python, focusing on scalable automation, internal tooling, and robust scripts.
Hands-on expertise in managing production-grade Kubernetes environments, configuration management with Ansible, and designing resilient infrastructure architectures within Azure Kubernetes Service and on-prem environments.
Deep proficiency in building standardized telemetry ecosystems with self-hosted open-source tools: Prometheus, Grafana, ELK Stack, Tempo, Thanos, Jaeger, and similar.
Ability to drive incident management, conduct post-incident analysis, and foster a culture of reliability and shared ownership.
Ability to leverage AI/ML techniques for SRE tasks such as AIOps, automated anomaly detection, log analysis, and optimizing reliability workflows.

Nice to have:

Experience with commercial observability and APM solutions (e.g., Datadog, Splunk, New Relic) or chaos engineering frameworks.

Obowiązki

Develop a standardized observability ecosystem and implement a conscious telemetry model focusing on structured events, distributed tracing, and intelligent sampling strategies.
Act as a strategic partner to product engineering teams, providing the platform, standards, and data for service reliability ownership; use error budgets and alerting to balance feature velocity with stability.
Enhance detection capabilities with early-warning systems and AI/ML for automated anomaly detection and intelligent data analysis to strengthen system resilience.
Build internal automation and tooling that streamlines SRE workflows, automates routine tasks, and enhances efficiency across the technology stack.
Participate in an on-call rotation for incident management, ensuring rapid resolution, effective communication, and post-incident analysis for continuous improvement.

Benefity

Salary range: 23,000–29,200 PLN gross per month (Employment contract)
Full remote work
On-call duty (100%)
Training budget
Conference trips
Language classes
Medical package
Insurance

Dofinansowanie szkoleń

Budżet konferencyjny

Kursy językowe

Opieka zdrowotna

Ubezpieczenie

XTB

38 aktywnych ofert

Zobacz wszystkie oferty

Aplikuj teraz