Senior SRE Engineer (Observability Focus)
Brak informacji o wynagrodzeniu
SeniorFull-time
#372227·Dodano 6 dni temu·0
Źródło: Capital.comTech Stack / Keywords
ArchitectureJavaPythonKafkaGrafanaDevOpsPrometheusSOLID
Firma i stanowisko
We are a leading trading platform that is ambitiously expanding globally. Our top-rated products have won prestigious industry awards for cutting-edge technology and seamless client experience. We are building out our observability practice and need a senior engineer to own it end to end.
Wymagania
- 6+ years in DevOps, SRE, or platform engineering role, with at least 2 years focused on observability tooling at production scale.
- Deep hands-on experience with VictoriaMetrics or Prometheus including MetricsQL/PromQL, exporters, service discovery, remote write, downsampling, and retention management.
- Solid OpenSearch or Elasticsearch skills including cluster operations, Query DSL, ISM policies, and ingest pipeline design.
- Production experience with OpenTelemetry including Collector configuration, OTLP, context propagation, and instrumentation across multiple languages.
- Strong Kafka skills including producer/consumer patterns, consumer group management, Kafka Connect, Schema Registry, and JMX-based monitoring; Strimzi experience is a plus.
- Proficiency with log shippers Fluent Bit, Vector, Fluentd and structured log parsing/normalization.
- Working knowledge of Kubernetes (operators, Helm), Argo CD/GitOps, and Terraform/Ansible.
- Comfortable in a hybrid AWS + on-prem environment with solid understanding of networking as it applies to scraping and shipping pipelines.
- Scripting ability in Bash or Python for automation and tooling.
- Strong communication skills and English proficiency.
Obowiązki
- Own the full observability stack: metrics (VictoriaMetrics), logs (OpenSearch), and traces (OpenTelemetry) from pipeline design to day-2 operations.
- Architect and run VictoriaMetrics cluster topology including vmagent scraping, remote write configuration, vmalert rules, and cardinality control.
- Operate OpenSearch clusters with index lifecycle management, hot-warm-cold architecture, shard tuning, and ingest pipelines via Data Prepper.
- Build and maintain OTEL Collector pipelines and instrument services across Java, Python, and JS/TS stacks.
- Run Kafka as the telemetry transport layer including topic design, partition strategy, consumer group lag monitoring, and throughput tuning.
- Manage log shipping infrastructure using Fluent Bit, Vector, or Fluentd; define structured logging standards and field normalization.
- Build Grafana dashboards and alerting that are clear, actionable, and well-structured.
- Work with platform and application teams to improve sampling strategies, batching, and context propagation.
- Contribute to incident response, post-mortems, and reliability improvements driven by observability signals.
- Mentor engineers on observability practices, tooling, and structured logging standards.
Benefity
- Competitive salary.
- Work-life harmony with a caring company culture.
- Generous annual leave policy.
- Employee referral program.
- Comprehensive health and pension benefits.
- 30 extra days to work remotely from anywhere in the world (with some restrictions).
- Two additional paid volunteer days per year.
Płatny urlop
Opieka zdrowotna
Ubezpieczenie
Pakiet relokacyjny
Premie
Inne informacje
Our company has an Internal Reporting Procedure available from Human Resources upon request. You may report a violation under the terms specified therein.
Capital.com
12 aktywnych ofert