Systems Software Engineer, Kubernetes Scale - DGX Cloud
176.3k - 305.5k PLN176 250 - 305 500 PLN/ mies.UoP
221.3k - 383.5k PLN221 250 - 383 500 PLN/ mies.UoP
MidFull-time·Umowa o pracę
#376856·Dodano dziś·0
Źródło: NVIDIATech Stack / Keywords
KubernetesCloudAINetworkTestingCI/CDArchitectureNetworking
Firma i stanowisko
The DGX Cloud organization at NVIDIA focuses on delivering accelerated computing solutions for advanced AI workloads by integrating cutting-edge hardware and software innovations.
Wymagania
- 2+ years of experience in Computer Architecture, Networking, Storage systems, Accelerators.
- Bachelor's or Master's degree in Engineering (Electrical, Computer Engineering, Computer Science) or equivalent experience.
- Expertise in Kubernetes and familiarity with CNCF projects.
- Experience with large scale parallel and distributed accelerator-based systems.
- Expertise in optimizing performance and AI workloads on large scale systems.
- Experience with performance modeling and benchmarking at scale.
- Proficiency in Golang and Python.
- Background with NVIDIA software ecosystem in training and inference domains.
- Expertise with at least one public cloud service provider (e.g., GCP, AWS, Azure, OCI).
Obowiązki
- Drive end-to-end performance and scale characterization for the NVIDIA DGX Cloud software stack, including Kubernetes and NVIDIA components.
- Collaborate with AI researchers, developers, and customers to develop automated tests simulating real user workloads.
- Investigate and resolve performance and scale issues in complex distributed systems.
- Design and develop monitoring, reporting, and analysis tools for performance and scale testing.
- Triage, debug, and root cause issues related to operating Kubernetes clusters at ultra-large scale.
- Build and maintain a continuous performance and scale testing framework via CI/CD pipelines.
- Document research, methodologies, and results; present findings at internal and external venues.
- Engage with upstream communities such as Kubernetes, CNCF, and NVIDIA open-source projects to validate performance and scalability.
NVIDIA
26 aktywnych ofert