Senior Site Reliability Engineer
23k - 29.2k PLN23 000 - 29 200 PLN/ mies.UoP
SeniorFull-time·Umowa o pracę
#371680·Dodano wczoraj·0
Źródło: nofluffjobs.comTech Stack / Keywords
PythonKubernetesDockerAnsibleGrafanaELK StackPrometheus
Firma i stanowisko
XTB is a global company from the financial industry, focusing on online trading of financial instruments. It is the largest FinTech in Poland and a leader in Central and Eastern Europe, operating in several countries including Asia and South America. XTB offers opportunities for employee development through various training and development programs.
Wymagania
- At least 5 years of professional experience in SRE, Infrastructure, or DevOps roles managing high-scale, distributed environments.
- Advanced programming skills in Python focused on scalable automation, internal tooling, and robust scripts.
- Hands-on expertise managing production-grade Kubernetes environments and configuration management tools like Ansible.
- Experience designing resilient infrastructure architectures within Azure Kubernetes Service and on-prem environments.
- Proficiency in building standardized telemetry ecosystems using self-hosted open-source tools such as Prometheus, Grafana, ELK Stack, Tempo, Thanos, and Jaeger.
- Ability to drive incident management, conduct post-incident analysis, and foster a culture of reliability and shared ownership.
- Ability to leverage AI/ML techniques for SRE tasks including AIOps, automated anomaly detection, log analysis, and optimizing reliability workflows.
- Experience with commercial observability and APM solutions (e.g., Datadog, Splunk, New Relic) or chaos engineering frameworks is highly valued.
Obowiązki
Observability Platform Engineering:
- Develop a standardized observability ecosystem.
- Implement a conscious telemetry model focusing on structured events, distributed tracing, and intelligent sampling strategies.
Reliability Enablement:
- Act as a strategic partner to product engineering teams.
- Provide platform, standards, and data to own service reliability.
- Use error budgets and alerting to balance feature velocity with stability.
Proactive Resilience & Protection:
- Enhance detection capabilities to identify issues before customer impact.
- Leverage early-warning systems and AI/ML for automated anomaly detection and intelligent data analysis.
Operations & Tooling:
- Build internal automation and tooling to streamline SRE workflows and automate routine operational tasks.
Incident Management & On-Call Rotation:
- Participate in on-call rotation for incident management.
- Ensure rapid incident resolution, effective communication, and post-incident analysis for continuous improvement.
Benefity
- Sport subscription
- Training budget
- Private healthcare
- Lunch card
- An extra day off on your birthday
- An extra day off for parents
- Access to an e-learning platform for learning English
Karta sportowa
Dofinansowanie szkoleń
Opieka zdrowotna
Firmowa stołówka
Płatny urlop
Szkolenia wewnętrzne
XTB
39 aktywnych ofert