Staff Software Engineer - Production Engineering

Brak informacji o wynagrodzeniu
SeniorFull-time
#343003·Dodano dziś·0
Źródło: Snowflake
Aplikuj teraz

Tech Stack / Keywords

SnowflakeAIGolangTestingKubernetesLinuxCloudAWS

Firma i stanowisko

Snowflake is a company focused on powering the era of the agentic enterprise by integrating AI as a core collaborator in work processes. The Production Engineering Team is responsible for driving reliability tools and processes to ensure a top-tier customer experience, including championing Service Level Objectives (SLOs), building infrastructure for rapid detection of reliability issues, and engaging in system health verification after releases.


Wymagania

  • Bachelor's degree in Computer Science, a related technical field involving software engineering, or equivalent practical experience.
  • Proficient in at least one modern programming language, preferably Golang.
  • Systematic problem-solving methods, effective communication skills.

Preferred qualifications:

  • 10+ years industry experience designing, building and supporting large scale systems in production.
  • Experience in modern observability tools and production monitoring practices.
  • Experience with capacity and load testing of distributed applications.
  • Experience with containers and container orchestration systems such as Kubernetes.
  • Experience in deploying, managing, and operating scalable and fault tolerant Linux infrastructure.
  • Experience with the SLO-driven reliability management processes.
  • Hands on experience with one or more public cloud providers (AWS, Azure, or GCP).
  • Ability to spot systematic issues, define roadmaps and guide other engineers to resolve them.

Obowiązki

  • Lead the improvement of the whole lifecycle of services—from inception and design, deployment, operation, and refinement.
  • Drive scaling systems sustainably by automation; drive changes that improve reliability and velocity.
  • Establish and practice low noise incident response rotations and blameless postmortems to prevent problem recurrence.
  • Write and review code. Develop documentation and capacity plans, and debug the hardest problems on large distributed systems.
  • Collaborate with software engineers to establish, maintain, and optimize functional and performance SLOs.
  • Participate in a 24x1 on-call rotation.
Snowflake

Snowflake

21 aktywnych ofert

Zobacz wszystkie oferty
Aplikuj teraz