NVIDIA
NVIDIA
New

Senior Systems Software Engineer, Kubernetes Scale - DGX Cloud

292.5k - 507k PLN/ mies.UoP
375k - 650k PLN/ mies.UoP
SeniorFull-time·Umowa o pracę
#376554·Dodano dziś·0
Źródło: NVIDIA
Aplikuj teraz

Tech Stack / Keywords

KubernetesCloudAINetworkTestingCI/CDArchitectureNetworking

Firma i stanowisko

The DGX Cloud organization at NVIDIA focuses on delivering accelerated computing solutions for advanced AI workloads, combining hardware and software innovation.

Wymagania

  • 8+ years of experience in Computer Architecture, Networking, Storage systems, Accelerators.
  • Bachelor's or Master's degree in Engineering (Electrical, Computer Engineering, Computer Science) or equivalent experience.
  • Expertise in Kubernetes and familiarity with CNCF projects.
  • Experience with large scale parallel and distributed accelerator-based systems.
  • Expertise in optimizing performance and AI workloads on large scale systems.
  • Experience with performance modeling and benchmarking at scale.
  • Proficiency in Golang and Python.
  • Background with NVIDIA software ecosystem in training and inference domains.
  • Expertise with at least one public cloud service provider (e.g., GCP, AWS, Azure, OCI).

Obowiązki

  • Drive end-to-end performance and scale characterization for the NVIDIA DGX Cloud software stack, including Kubernetes and NVIDIA components.
  • Collaborate with AI researchers, developers, and customers to develop automated tests simulating real user workloads.
  • Investigate and resolve performance and scale issues in complex distributed systems.
  • Design and develop monitoring, reporting, and analysis tools for performance and scale testing.
  • Triage, debug, and root cause issues related to operating Kubernetes clusters at ultra-large scale.
  • Build and maintain a continuous performance and scale testing framework via CI/CD pipelines.
  • Document research, methodologies, and results; present findings at internal and external venues.
  • Engage with upstream communities such as Kubernetes, CNCF, and NVIDIA open-source projects to validate performance and scalability.
NVIDIA

NVIDIA

25 aktywnych ofert

Zobacz wszystkie oferty
Aplikuj teraz