Observability, Staff Infrastructure Engineer

350 700 - 474 400 PLN/ rok.Umowa o pracę (brutto)
SeniorFull-time·Umowa o pracę
#356324·Dodano dziś·0
Źródło: Graphcore
Aplikuj teraz

Tech Stack / Keywords

AIArchitectureCloudTestingUnit TestingPrometheusGrafanaKafka

Firma i stanowisko

Graphcore is a company building the future of AI compute, with expertise in semiconductor, software, and AI. It is part of the SoftBank Group and delivers technology into the SoftBank AI ecosystem. The company is expanding globally to address AI opportunities.


Wymagania

  • BSc or MSc degree in Computer Engineering, Computer Science, or related field, or equivalent experience.
  • Proven experience architecting and implementing scalable, performant, reliable cluster management systems including telemetry collection and analysis engines.
  • Experience managing large-scale datacenters with a focus on hardware observability solutions.
  • Experience maintaining and scaling modern observability stacks using Prometheus, Grafana, OTEL, ClickHouse, Kafka, Superset, or Elastic Stack.
  • Understanding of secure telemetry practices and data exposure controls.
  • Working knowledge of Datadog, Dynatrace, or Splunk.
  • Experience with large-scale telemetry datasets, time series databases, down-sampling techniques, and creating actionable dashboards.
  • Experience with automation technologies such as Ansible or Terraform.
  • Experience with containerization technologies including Docker and Kubernetes.
  • Experience managing or developing in Linux environments.
  • Strong skills in at least one of C, C++, Go, or Python.
  • Excellent written and verbal communication skills.

Nice to have:

  • Knowledge of cloud-native development and deployment methodologies (SaaS/PaaS/IaaS).
  • Knowledge of data center networking and monitoring best practices.
  • Knowledge of monitoring, observability, and management solutions used by hyperscalers.
  • Knowledge of declarative management systems.

Obowiązki

  • Contribute to all phases of product development, including definition, architecture, design, implementation, debugging, testing, and early customer support.
  • Design and implement fault-remediation solutions at scale.
  • Implement multi-component integrations based on Graphcore and third-party technology stacks, covering data ingestion to decision making, ensuring seamless management, monitoring, and UI.
  • Create reference designs including documentation, configuration files, scripts, and source code.
  • Deploy solutions internally to support engineering teams in debugging, performance analysis, benchmarking, and test/QA at all scales.
  • Maintain and improve deployed infrastructure long term to provide the best service for customers.
  • Ensure solutions are properly tested by collaborating with development and QA teams to enhance unit testing and comprehensive test plans.
  • Mentor and guide junior engineers, fostering continuous learning and improvement.

Oferta

  • Competitive salary.
  • Annual leave policy.
  • Medical and dental health plans.
  • Gym card.
  • Employee pension matched up to 4%.
  • Yearly review of benefits to ensure value and reward.
  • Inclusive work environment with equal opportunity process.
  • Flexible approach to interviews and reasonable adjustments if required.
Płatny urlop
Opieka zdrowotna
Karta sportowa

Inne informacje

Applicants must hold the right to work in Poland. Visa sponsorship or support for visa applications is not provided.

Graphcore

Graphcore

30 aktywnych ofert

Zobacz wszystkie oferty
Aplikuj teraz