Margo
Margo
New

(Senior) Network Reliability Engineer (NRE) - AI GPU Clusters

200 - 250 PLN/ godz.B2B
SeniorFull-time·B2B
#365150·Dodano dziś·0
Źródło: nofluffjobs.com
Aplikuj teraz

Tech Stack / Keywords

AISecurityCommunication skillsGoPythonBashLinuxUbuntuNetworkingTCPDNSBGPHPSGPUPrometheusGrafanaAnsibleSaltAWXRelational databaseGitLabVLANPXE

Firma i stanowisko

Margo is a company working on AI GPU cluster infrastructure with a client based in California, USA. The role involves supporting and scaling production environments for AI infrastructure.


Wymagania

  • Experience with Go or Python
  • Strong scripting skills (Bash, Python)
  • Hands-on experience with Linux systems (Ubuntu/Debian)
  • Preferred hands-on experience with GPU & HPC infrastructure
  • Knowledge of networking (LAN/VLAN, TCP/IP, DNS, BGP, load-balancing, IPv6)
  • Experience with boot systems like PXE
  • Familiarity with monitoring and logging tools (Prometheus, Grafana, Elastic)
  • Comfortable with Infrastructure-as-Code tools (Ansible, Salt, AWX)
  • Experience managing relational databases (MariaDB)
  • Understanding of CI/CD pipelines (GitLab)
  • Comfortable with English (written and spoken)
  • Proactive and solution-oriented mindset
  • Passion for automation and continuous improvement
  • Strong collaboration and communication skills
  • Ability to work independently and in a team
  • Willingness to mentor and share knowledge

Obowiązki

  • Build a large AI infrastructure with monitoring, diagnosis, and remediation of production incidents
  • Troubleshoot high-impact production issues in collaboration with other engineering teams
  • Participate in an on-call rotation to handle incidents and ensure service continuity
  • Implement and maintain observability solutions to monitor AI infrastructure and application health
  • Contribute to AI infrastructure lifecycle management across different environments and countries
  • Promote and apply best practices in stability, resiliency, scalability, and security
  • Maintain clear technical documentation for tools and procedures
  • Contribute to system and tool evolution based on production feedback
  • Collaborate closely with development teams to ensure infrastructure readiness
  • Participate in team rituals and knowledge-sharing initiatives

Oferta

  • International environment
  • Sport subscription
  • Private healthcare
  • Flat structure
  • International projects
  • Free coffee
  • Canteen
  • Bike parking
  • Playroom
  • Shower
  • Free snacks
  • Modern office
  • No dress code
Karta sportowa
Opieka zdrowotna
Napoje w biurze
Firmowa stołówka
Parking dla rowerów
Prysznic
Darmowe przekąski

Inne informacje

Client is based in California, USA. Working hours start around 18:00 CEST. Long-term project of minimum one year.

Margo

Margo

31 aktywnych ofert

Zobacz wszystkie oferty
Aplikuj teraz