Infrastructure/GPU Engineer

Remote Full-time
Cognizant is seeking a highly skilled hands-on Infrastructure Engineer with proven experience in the physical and technical deployment of AI-ready environments optimized for AI and machine learning workloads. This role focuses on NVIDIA DGX or similar systems, GPU-accelerated compute clusters, high-speed networking, and scalable storage solutions. The ideal candidate will have deep expertise in infrastructure design ,deployment, workload orchestration, and performance optimization in enterprise environments. This is a remote role in the US. Salary range for this role is between $99,000 and $116,000 depending on skills and qualifications of the candidate. Applications will be accepted till 10/21/2025. Key Responsibilities System Design & Deployment Help in rightsizing GPU investment Architect and deploy NVIDIA DGX systems and GPU-based compute clusters. Design and implement scalable parallel filesystems (e.g., Lustre, BeeGFS, GPFS). Integrate high-speed interconnects using InfiniBand, RoCE, and RDMA. Collaborate on rack planning and airflow optimization. Cluster & Infrastructure Management Configure and manage Slurm Workload Manager for job scheduling. Deploy and maintain cluster orchestration tools Automate provisioning using PXE boot, Terraform, Redfish, and Kubernetes. Perform firmware updates, BIOS/IPMI/BMC configuration, and OS provisioning Knowledge of Run.ai, ClearML or similar platform Networking & Performance Optimization Design and validate network topologies including IPMI, internal/external networks, and InfiniBand fabrics. Optimize RDMA and RoCE configurations for low-latency, high-throughput data transfers. Conduct performance benchmarking using GPU-Burn, NCCL, and NVSM. Monitoring & Troubleshooting Implement system health checks and diagnostics across compute, storage, and network layers. Troubleshoot hardware/software issues and ensure reliable infrastructure operation. Required Skills & Qualifications Technical Expertise Deep understanding of NVIDIA DGX architecture, CUDA, and GPU compute. Strong Linux system administration and shell scripting skills. Experience with Slurm, parallel filesystems, and high-speed networking (InfiniBand/RDMA/RoCE). Familiarity with containerization (Docker), orchestration (Kubernetes), and automation tools (Ansible, Redfish). Preferred Qualifications Experience with BBCM, and DGX BasePOD/SuperPOD configuration Certifications by Nvidia or equivalent OEM.
Apply Now →

Similar Jobs

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote

USPS Office Helper

Remote

Experienced Remote Data Entry and Reporting Analyst – Full Time Opportunity for Career Growth and Development at arenaflex

Remote

**Experienced Data Entry Specialist – Content Management for Netflix Streaming Experience**

Remote

Application Security Engineer/Penetration Testing || 100% Remote

Remote

Experienced Data Entry Specialist – Remote Part-Time Opportunity for Career Growth and Development at arenaflex

Remote

Experienced Full-Time Retail Cosmetics Counter Manager – Lancome, Fayette Mall - Sales Leadership and Customer Experience Opportunities with Macy's

Remote

Data Entry Remote Jobs at Fedex $25/Hour – Work...

Remote

Data Entry Jobs For Year Olds-

Remote

SR MGR CONTINUOUS IMPROVEMENT - REMOTE

Remote

Experienced Customer Service Specialist for Dynamic Team – Delivering Exceptional Support and Sales Excellence in a Remote Setting

Remote

Experienced Customer Service and Technical Support Representative for Outdoor Equipment – Delivering Exceptional Solutions and Customer Satisfaction

Remote
← Back