[Remote] Senior AI and ML HPC Cluster Engineer

Remote Full-time
Note: The job is a remote job and is open to candidates in USA. NVIDIA is a leading technology company that has revolutionized the field of computing with its GPU innovations. The role involves providing leadership in the design and implementation of GPU compute clusters for demanding deep learning and high performance computing workloads, along with managing large-scale HPC systems and developing automation solutions.ResponsibilitiesProvide leadership and strategic guidance on the management of large-scale HPC systems including the deployment of compute, networking, and storageDevelop and improve our ecosystem around GPU-accelerated computing including developing scalable automation solutionsBuild and maintain AI and ML heterogeneous clusters on-premises and in the cloudCreate and cultivate customer and cross-team relationships to reliably sustain the clusters and meet user evolving user needsSupport our researchers to run their workloads including performance analysis and optimizationsConduct root cause analysis and suggest corrective action Proactively find and fix issues before they occurSkillsBachelor's degree in Computer Science, Electrical Engineering or related field or equivalent experienceMinimum 5+ years of experience designing and operating large scale compute infrastructureExperience with AI/HPC advanced job schedulers, such as Slurm, K8s, PBS, RTDA or LSFProficient in administering Centos/RHEL and/or Ubuntu Linux distributionsSolid understanding of cluster configuration managements tools such as Ansible, Puppet, SaltIn depth understating of container technologies like Docker, Singularity, Podman, Shifter, CharliecloudProficiency in Python programming and bash scriptingApplied experience with AI/HPC workflows that use MPIExperience analyzing and tuning performance for a variety of AI/HPC workloadsPassion for continual learning and staying ahead of emerging technologies and effective approaches in the HPC and AI/ML infrastructure fieldsBackground with NVIDIA GPUs, CUDA Programming, NCCL and MLPerf benchmarkingExperience with Machine Learning and Deep Learning concepts, algorithms and modelsFamiliarity with InfiniBand with IPoIB and RDMAUnderstanding of fast, distributed storage systems like Lustre and GPFS for AI/HPC workloadsFamiliarity with deep learning frameworks like PyTorch and TensorFlowBenefitsEquityBenefitsCompany OverviewNVIDIA is a computing platform company operating at the intersection of graphics, HPC, and AI. It was founded in 1993, and is headquartered in Santa Clara, California, USA, with a workforce of 10001+ employees. Its website is https://www.nvidia.com.Company H1B SponsorshipNVIDIA has a track record of offering H1B sponsorships, with 448 in 2026, 1872 in 2025, 1354 in 2024, 976 in 2023, 835 in 2022, 601 in 2021, 529 in 2020. Please note that this does not guarantee sponsorship for this specific role.

Apply Now →

Similar Jobs

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote

USPS Office Helper

Remote

Apply Now: Urgently Need Medical Coding Educator - Hospital

Remote

[Remote] Strategic Finance - R&D/G&A

Remote

Hiring Now: Senior Manager - Operational Success, Trust

Remote

Administrative Assistant / Scheduling Assistant – Masters Petersens Flooring & Interior Design – Windsor, CO

Remote

**Experienced Data Analyst – Retail Merchandising and Analytics**

Remote

Principal Associate, FP&A - Small Business Card (Hybrid)

Remote

CSR - Customer Service Representative

Remote

Graphic Designer - Remote, Part-time

Remote

Remote Typing Associate

Remote

Licensed Practical Nurse, Virtual Care

Remote
← Back