SRE
140 - 180 PLN/ godz.B2B (netto)
SeniorFull-time·B2B
#308009·Dodano dwa miesiące temu·58
Źródło: nofluffjobs.comTech Stack / Keywords
DNSTCPHTTPBGPProxyTerraformAnsibleAWS LambdaGitLab CIUnixLinuxDatadogGrafanaCDNOTT
Firma i stanowisko
Antal is hiring a Site Reliability Engineer (SRE) to support the development and maintenance of international CDN platforms, both on-premises and cloud-based, impacting OTT services globally.
Wymagania
- Higher technical education (Computer Science, Networks/Telecommunications).
- Minimum 4–5 years experience in SysOps / DevOps / SRE roles.
- Solid networking fundamentals: DNS, TCP, HTTP, routing (BGP), caching, proxy.
- Knowledge of tools: Terraform, Ansible, AWS Lambda, GitLab CI/CD.
- Very good knowledge of Unix/Linux systems.
- Experience with monitoring tools (Datadog, Grafana).
- Nice to have: knowledge of CDN/OTT and QoS topics.
- Analytical thinking, independence, and good work organization.
- Ability to collaborate with technical and non-technical teams.
- Fluent English; French is a plus.
- Motivation to work on large-scale, highly available systems.
- Interest in automation, observability, and performance engineering.
Obowiązki
CDN Reliability & Operations:
- Ensure availability, resilience, and high performance of CDN platforms (cloud, baremetal, international networks, IXs, ISP-side cache).
- Regularly analyze CDN capacity, performance, and traffic forecasts.
- Participate in deployments, production rollouts, and OTT consumption analysis across multiple regions.
- Monitor key metrics (latency, throughput, cache hit ratio, error rate) and implement optimizations.
- Participate in incident handling, root cause analysis, and reliability improvement plans.
- Occasionally support DevOps teams with operational tasks.
Observability & Monitoring:
- Build and maintain observability layers for all CDN environments using Datadog.
- Create and maintain standardized dashboards, alerts, SLO/SLA, and log pipelines.
- Design scalable monitoring solutions capable of handling large traffic volumes.
- Implement automated health checks, anomaly detection, and alert workflows.
- Improve data collection and visualization processes for technical and business teams.
Development of Tools & Automation:
- Develop scripts and workflows (Python/Bash/API) for metrics collection, cost analysis, and operational data.
- Build internal tools for log analysis, audience and traffic visualization, CDN configuration validation, diagnostics, troubleshooting, and cache testing.
- Support automation based on Terraform, CI/CD, and automatic configuration rollouts.
Collaboration & CDN Governance:
- Collaborate with OTT engineering, DevOps, Network, Security, Data teams, and international units.
- Develop and maintain global standards (latency, TTL, caching, observability, security, costs).
- Share knowledge with teams across Europe, Africa, and Asia.
- Prepare technical documentation and deployment materials.
- Work with ISPs, cloud providers, and operations teams to resolve distribution issues.
- Support major events (sports, live, peak traffic) with preparation, monitoring, and post-event analysis.
Oferta
- Start: April/May 2026
- Contract type: B2B
- Long-term cooperation
- 100% remote work
- Benefits: Multisport card and Luxmed healthcare
Karta sportowa
Opieka zdrowotna
Antal Sp. z o.o.
940 aktywnych ofert