[Remote] Senior System Software Engineer - DevOps and Infrastructure Automation

Remote Full-time
Note: The job is a remote job and is open to candidates in USA. NVIDIA is a leading technology company revolutionizing computer graphics, PC gaming, and accelerated computing. As a Senior System Software Engineer on NVIDIA's AI Inference Operations Team, you will focus on DevOps and Infrastructure Automation, working to design and manage the infrastructure that powers AI inference products.ResponsibilitiesDesign, build, and operate the infrastructure backbone powering AI inference products — reliable, performant, and scalable at every layer!Own Kubernetes deployments end-to-end across cloud and on-prem: runbooks, canary checks, post-deploy validation, and rollbacks when neededArchitect CI/CD pipelines for automated build, test, packaging, and release of inference libraries and their container-based software stacksBuild observability that actually tells the truth about platform health — dashboards, logs, metrics, automated checks — and lead first-level incident triage with clean, actionable handoffs to engineeringManage cloud and on-prem environments with infrastructure-as-code (Terraform, Ansible, Helm, Crossplane), and chip away at toil using GitHub Actions, GitLab CI, and custom toolingOwn the security posture for infrastructure components: vulnerability scans, CVE remediation, and compliance with internal policiesCollaborate closely with deep learning framework engineers, compiler teams, and platform architects to streamline end-to-end deployment!SkillsBS/MS in CS/CE or equivalent experience, plus 7+ years operating production distributed systems (SRE / DevOps / Platform Ops)Deep Kubernetes expertise — components, subsystems, on-prem setup, and hands-on debugging of telemetry-heavy microservices across AWS, Azure, GCP, and on-premStrong CI/CD chops (GitLab CI, GitHub Actions), Git-based workflows, Linux systems programming, and scripting in Python and BashIaC fluency (Terraform, Ansible, Helm, Crossplane) and containerization depth (Docker, containerd, OCI)Proven reliability ownership — SLOs/SLIs, on-call, incident response, and post-incident reviews that drive measurable improvements — backed by hands-on experience with observability stacks like Prometheus, Grafana, and LokiA clear communicator who writes runbooks people actually use!MLOps experience — crafting, deploying, and operating machine learning pipelines end to endExperience in open-source development workflows and community engagement on projects like Triton Inference Server or ONNX RuntimeFamiliarity with GPU software stacks — CUDA, cuDNN, TensorRT, and inference serving frameworksExperience building custom test automation frameworks and using data-driven metrics to improve platform health and developer efficiencyDemonstrated ability to debug complex issues spanning kernel modules, container runtimes, and distributed networkingBenefitsYou will also be eligible for equity and [benefits](https://www.nvidia.com/en-us/benefits/).Company OverviewNVIDIA is a computing platform company operating at the intersection of graphics, HPC, and AI. It was founded in 1993, and is headquartered in Santa Clara, California, USA, with a workforce of 10001+ employees. Its website is https://www.nvidia.com.Company H1B SponsorshipNVIDIA has a track record of offering H1B sponsorships, with 448 in 2026, 1872 in 2025, 1354 in 2024, 976 in 2023, 835 in 2022, 601 in 2021, 529 in 2020. Please note that this does not guarantee sponsorship for this specific role.

Apply Now →

Similar Jobs

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote

USPS Office Helper

Remote

Require Early Learning Specialist in Clearwater, FL

Remote

Staff Writer

Remote

Experienced Full Stack Customer Service Representative – Remote Work Opportunity in Bilingual French & English at Blithequark

Remote

Remote Medical Enrollment Specialist (Texas Medicaid) - Transform Lives with Invicta Health Solutions

Remote

Area Health Information Specialist I

Remote

Urgently Hiring: Apple Customer Service Work From Home

Remote

[Remote-Position] Urgently Require Speech Language Pathologist

Remote

Key Account Manager - General Manager

Remote

**Experienced Full Stack Project Manager – Customer Success Specialist (Remote)**

Remote

**Experienced Remote Customer Service Representative – Deliver Exceptional Travel Experiences at arenaflex**

Remote
← Back