Senior Systems Software Engineer, Kubernetes Scale - DGX Cloud
292.5k - 507k PLN292 500 - 507 000 PLN/ mies.UoP
375k - 650k PLN375 000 - 650 000 PLN/ mies.UoP
SeniorFull-time·Umowa o pracę
#376554·Dodano dziś·0
Źródło: NVIDIATech Stack / Keywords
KubernetesCloudAINetworkTestingCI/CDArchitectureNetworking
Firma i stanowisko
The DGX Cloud organization at NVIDIA focuses on delivering accelerated computing solutions for advanced AI workloads, combining hardware and software innovation.
Wymagania
- 8+ years of experience in Computer Architecture, Networking, Storage systems, Accelerators.
- Bachelor's or Master's degree in Engineering (Electrical, Computer Engineering, Computer Science) or equivalent experience.
- Expertise in Kubernetes and familiarity with CNCF projects.
- Experience with large scale parallel and distributed accelerator-based systems.
- Expertise in optimizing performance and AI workloads on large scale systems.
- Experience with performance modeling and benchmarking at scale.
- Proficiency in Golang and Python.
- Background with NVIDIA software ecosystem in training and inference domains.
- Expertise with at least one public cloud service provider (e.g., GCP, AWS, Azure, OCI).
Obowiązki
- Drive end-to-end performance and scale characterization for the NVIDIA DGX Cloud software stack, including Kubernetes and NVIDIA components.
- Collaborate with AI researchers, developers, and customers to develop automated tests simulating real user workloads.
- Investigate and resolve performance and scale issues in complex distributed systems.
- Design and develop monitoring, reporting, and analysis tools for performance and scale testing.
- Triage, debug, and root cause issues related to operating Kubernetes clusters at ultra-large scale.
- Build and maintain a continuous performance and scale testing framework via CI/CD pipelines.
- Document research, methodologies, and results; present findings at internal and external venues.
- Engage with upstream communities such as Kubernetes, CNCF, and NVIDIA open-source projects to validate performance and scalability.
NVIDIA
25 aktywnych ofert