XTB

Senior Site Reliability Engineer

23k - 29.2k PLN23 000 - 29 200 PLN/ mies./ mies.UoPUmowa o pracę (brutto)

SeniorFull-time·Umowa o pracę

#371680·Dodano 21 dni temu·14

Źródło: nofluffjobs.com

Aplikuj teraz

Tech Stack / Keywords

PythonKubernetesDockerAnsibleGrafanaELK StackPrometheus

Firma i stanowisko

XTB is a global company from the financial industry, focusing on online trading of financial instruments. It is the largest FinTech in Poland and a leader in Central and Eastern Europe, operating in several countries including Asia and South America. XTB offers opportunities for employee development through various training and development programs.

Wymagania

At least 5 years of professional experience in SRE, Infrastructure, or DevOps roles managing high-scale, distributed environments.
Advanced programming skills in Python focused on scalable automation, internal tooling, and robust scripts.
Hands-on expertise managing production-grade Kubernetes environments and configuration management tools like Ansible.
Experience designing resilient infrastructure architectures within Azure Kubernetes Service and on-prem environments.
Proficiency in building standardized telemetry ecosystems using self-hosted open-source tools such as Prometheus, Grafana, ELK Stack, Tempo, Thanos, and Jaeger.
Ability to drive incident management, conduct post-incident analysis, and foster a culture of reliability and shared ownership.
Ability to leverage AI/ML techniques for SRE tasks including AIOps, automated anomaly detection, log analysis, and optimizing reliability workflows.
Experience with commercial observability and APM solutions (e.g., Datadog, Splunk, New Relic) or chaos engineering frameworks is highly valued.

Obowiązki

Observability Platform Engineering:

Develop a standardized observability ecosystem.
Implement a conscious telemetry model focusing on structured events, distributed tracing, and intelligent sampling strategies.

Reliability Enablement:

Act as a strategic partner to product engineering teams.
Provide platform, standards, and data to own service reliability.
Use error budgets and alerting to balance feature velocity with stability.

Proactive Resilience & Protection:

Enhance detection capabilities to identify issues before customer impact.
Leverage early-warning systems and AI/ML for automated anomaly detection and intelligent data analysis.

Operations & Tooling:

Build internal automation and tooling to streamline SRE workflows and automate routine operational tasks.

Incident Management & On-Call Rotation:

Participate in on-call rotation for incident management.
Ensure rapid incident resolution, effective communication, and post-incident analysis for continuous improvement.

Benefity

Sport subscription
Training budget
Private healthcare
Lunch card
An extra day off on your birthday
An extra day off for parents
Access to an e-learning platform for learning English

Karta sportowa

Dofinansowanie szkoleń

Opieka zdrowotna

Firmowa stołówka

Płatny urlop

Szkolenia wewnętrzne

XTB

43 aktywne oferty

Zobacz wszystkie oferty

Aplikuj teraz