Senior DevOps Engineer (AI & Platform Operations)

120 - 125 PLN/ godz./ godz.B2BB2B (netto)

SeniorFull-time·B2B

#365137·Dodano 19 dni temu·0

Źródło: nofluffjobs.com

Tech Stack / Keywords

kubernetesITILSplunkApicaSysdigPrometheusGrafanaJavaJava EEBashPython

5+ years in IT operations, application support (2nd/3rd line), or a similar production-facing role
Proven track record of owning incidents end-to-end — from alert to RCA to prevention
2+ years working within an ITIL framework (incident, problem, change management)
Experience working in Agile delivery environments alongside development teams
Excellent English communication skills — able to explain technical issues clearly to both engineers and non-technical stakeholders
Proficiency with log analysis and alerting tools: Splunk, Apica, Sysdig
Observability tooling: Prometheus, Grafana — reading dashboards, tuning alerts
Comfortable operating services running on Kubernetes (checking pod health, reading logs, triggering restarts — not cluster administration)
Familiarity with Jenkins pipelines to execute and troubleshoot deployments
Experience with relational databases (Oracle, DB2) — querying, interpreting execution plans, identifying data-related incidents
Working knowledge of Spring/Hibernate application behavior, Kafka message flows, XML/JSON payloads — enough to trace an issue through the stack

Nice to have:

Incident & Problem Management:

Own the RCA process for production incidents — diagnose, resolve, and put preventive measures in place so issues don't recur

Production Monitoring & Support:

Continuously monitor service health, detect anomalies early, and act before they become incidents

Deployment Execution:

Trigger and oversee release deployments through existing CI/CD pipelines; troubleshoot failed deployments and coordinate rollbacks when needed

Environment Oversight:

Keep Pre-Production and Production environments stable and aligned — not building them from scratch, but ensuring they behave as expected day to day

Runbook & Knowledge Management:

Document operational procedures, known issues, and resolution steps to build a reliable knowledge base for the team

Cross-team Collaboration:

Work shoulder-to-shoulder with development and platform teams to triage issues, clarify operational requirements, and close the feedback loop between prod and dev

Continuous Improvement:

Identify recurring pain points and propose automation or tooling to reduce toil
Improve observability coverage — dashboards, alerts, log queries — to catch issues faster
Contribute to service continuity initiatives and disaster recovery drills

Opieka zdrowotna

Karta sportowa

DCG

330 aktywnych ofert