NVIDIA
NVIDIA
New

Solutions Architect, DevOps

221.3k - 383.5k PLN/ rok.UoP
292.5k - 507k PLN/ rok.UoP
SeniorFull-time·Umowa o pracę
#363922·Dodano wczoraj·0
Źródło: NVIDIA
Aplikuj teraz

Tech Stack / Keywords

DevOpsCloudAIArchitectureKubernetesLinuxNetworkingCUDA

Firma i stanowisko

NVIDIA is a company involved in building and advising on large and fast AI/HPC systems worldwide, serving academic and commercial organizations with deep learning and data analytics solutions.


Wymagania

  • BS/MS/PhD in Computer Science, Electrical/Computer Engineering, Physics, Mathematics, or related fields.
  • 5+ years of professional experience in managing scalable cloud environments and automation engineering roles.
  • Proven understanding of networking fundamentals, data center architectures, and hands-on experience managing HPC/AI clusters.
  • Hands-on experience deploying, configuring, and optimizing NVIDIA GPU-accelerated infrastructure.
  • Extensive experience with Kubernetes for container orchestration, resource scheduling, scaling, and integration with GPU-accelerated and HPC environments.
  • Strong familiarity with HPC and AI technologies including CPUs, GPUs, and high-speed interconnects.
  • Deep knowledge of Linux (RedHat, Ubuntu), OS-level security, and protocols.
  • Proficiency in Python and Bash scripting, configuration management, and Infrastructure-as-Code tools such as Ansible and Terraform.
  • Experience with observability stacks like Grafana, Loki, and Prometheus.
  • Strong background in crafting scalable solutions and providing consultative support to customers, including leading architectural reviews and presenting to executive stakeholders.

Nice to have:

  • Knowledge of CI/CD pipelines for software deployment and automation.
  • Hands-on knowledge of Kubernetes operators for GPU and Network management.
  • Practical experience with SLURM, MPI, enroot, and job provisioning.
  • Experience with software change management of clusters across compute, network, and storage.
  • Experience with NVIDIA Base Command Manager (BCM) and RDMA-based fabrics (InfiniBand or RoCE) in HPC or AI environments.

Obowiązki

  • Advise on and help maintain large-scale computational and AI infrastructure, including monitoring, logging, and workload orchestration (Kubernetes and Linux job schedulers).
  • Provide consultative guidance and perform hands-on troubleshooting across the full stack from bare metal and operating system through software stack, container platform, networking, and storage.
  • Assess customer environments and recommend optimized, production-ready Kubernetes-based container platforms integrated with enterprise-grade networking and storage solutions.
  • Develop, refine, and document standard methodologies and operational guidelines for internal teams and customer stakeholders.
  • Support development activities and engage in POCs/POVs to validate new features, architectures, and upgrade approaches.
  • Create and deliver high-quality documentation, including runbooks, onboarding materials, and best-practice guides.
  • Act as the technical leader for assigned customer accounts, providing strategic guidance on DevOps and platform architecture and influencing long-term infrastructure and operations decisions.
NVIDIA

NVIDIA

26 aktywnych ofert

Zobacz wszystkie oferty
Aplikuj teraz