Senior DevOps Engineer (AI & Platform Operations)
120 - 125 PLN/ godz.B2B
SeniorFull-time·B2B
#365137·Dodano dziś·0
Źródło: nofluffjobs.comTech Stack / Keywords
kubernetesITILSplunkApicaSysdigPrometheusGrafanaJavaJava EEBashPython
Wymagania
- 5+ years in IT operations, application support (2nd/3rd line), or a similar production-facing role
- Proven track record of owning incidents end-to-end — from alert to RCA to prevention
- 2+ years working within an ITIL framework (incident, problem, change management)
- Experience working in Agile delivery environments alongside development teams
- Excellent English communication skills — able to explain technical issues clearly to both engineers and non-technical stakeholders
- Proficiency with log analysis and alerting tools: Splunk, Apica, Sysdig
- Observability tooling: Prometheus, Grafana — reading dashboards, tuning alerts
- Comfortable operating services running on Kubernetes (checking pod health, reading logs, triggering restarts — not cluster administration)
- Familiarity with Jenkins pipelines to execute and troubleshoot deployments
- Experience with relational databases (Oracle, DB2) — querying, interpreting execution plans, identifying data-related incidents
- Working knowledge of Spring/Hibernate application behavior, Kafka message flows, XML/JSON payloads — enough to trace an issue through the stack
Nice to have:
- Java/J2EE development background
- IBM Datastage operational experience
- Scripting (Bash, Python) for automation of repetitive operational tasks
- Ansible for applying configuration changes in controlled operational scenarios
Obowiązki
Incident & Problem Management:
- Own the RCA process for production incidents — diagnose, resolve, and put preventive measures in place so issues don't recur
Production Monitoring & Support:
- Continuously monitor service health, detect anomalies early, and act before they become incidents
Deployment Execution:
- Trigger and oversee release deployments through existing CI/CD pipelines; troubleshoot failed deployments and coordinate rollbacks when needed
Environment Oversight:
- Keep Pre-Production and Production environments stable and aligned — not building them from scratch, but ensuring they behave as expected day to day
Runbook & Knowledge Management:
- Document operational procedures, known issues, and resolution steps to build a reliable knowledge base for the team
Cross-team Collaboration:
- Work shoulder-to-shoulder with development and platform teams to triage issues, clarify operational requirements, and close the feedback loop between prod and dev
Continuous Improvement:
- Identify recurring pain points and propose automation or tooling to reduce toil
- Improve observability coverage — dashboards, alerts, log queries — to catch issues faster
- Contribute to service continuity initiatives and disaster recovery drills
Oferta
- Private medical care
- Co-financing for the sports card
- Constant support of dedicated consultant
- Employee referral program
Opieka zdrowotna
Karta sportowa
DCG
364 aktywne oferty