[Remote] Senior Systems Engineer, Storage - DGX Cloud

Remote Full-time
Note: The job is a remote job and is open to candidates in USA. NVIDIA is a leading technology company known for its innovative GPU cloud services. The Senior Systems Engineer will design, deploy, and operate solutions on Kubernetes for large-scale storage and data platforms, ensuring reliability and performance through automation and observability.ResponsibilitiesDesign, deploy, and operate solutions on Kubernetes for large-scale storage and data platforms, including the manifests, Helm charts, and operators that run themBuild tools, services, and automation that improve the lifecycle of storage and data systems – from provisioning and configuration through deployment, scaling, and day-2 operationsDevelop and operate telemetry and observability for production systems – metrics, logging, tracing, dashboards, and alerting – so that system health, availability, and latency are measurable and actionableApply strong analytical troubleshooting skills to diagnose and resolve complex issues across distributed, containerized infrastructureWork closely with peers and partner teams to improve the lifecycle of services, from inception and design through deployment, operation, and refinementScale systems sustainably through automation, infrastructure-as-code, and CI/CD, and evolve systems by pushing for changes that improve reliability and velocitySupport services before they go live through activities such as deployment automation, capacity planning, and launch and readiness reviewsPractice sustainable incident response and postmortems, and participate in an on-call rotation to support production systemsSkillsBS degree (or equivalent experience) in Computer Science or related technical field involving coding12+ years of practical experienceHands-on experience with Kubernetes – deploying, configuring, and operating workloads and solutions on Kubernetes in productionExperience building tools and services for storage, data, or platform infrastructure, with solid software design fundamentals (algorithms, data structures, complexity analysis) on large-scale Linux-based systemsExperience building and operating telemetry and observability using tools such as Prometheus, InfluxDB, Grafana, and the Elastic stackStrong analytical troubleshooting skills with a systematic, root-cause-driven approach to identifying and resolving complex problemsProficiency in one or more of the following: Python, Go, or JavaGood knowledge of infrastructure configuration management and infrastructure-as-code tools such as Ansible, Chef, Puppet, ArgoCD, Git Pipelines, and TerraformCustomer-first mindset with a focus on customer satisfaction and a passion for ensuring customer successExperience with Git, code review, pipelines, and CI/CDExperience using or running large private and public cloud systems based on Kubernetes, OpenStack, and DockerInterest in crafting, analyzing, and fixing large-scale distributed systems, with strong debugging skills and a systematic problem-solving approachExperience designing storage- or data-focused tooling and automating their operations at scaleThrive in collaborative environments and enjoy working with various teams, and are flexible in adapting to different working stylesBenefitsYou will also be eligible for equity and [benefits](https://www.nvidia.com/en-us/benefits/).Company OverviewNVIDIA is a computing platform company operating at the intersection of graphics, HPC, and AI. It was founded in 1993, and is headquartered in Santa Clara, California, USA, with a workforce of 10001+ employees. Its website is https://www.nvidia.com.Company H1B SponsorshipNVIDIA has a track record of offering H1B sponsorships, with 448 in 2026, 1872 in 2025, 1354 in 2024, 976 in 2023, 835 in 2022, 601 in 2021, 529 in 2020. Please note that this does not guarantee sponsorship for this specific role.

Apply Now →

Similar Jobs

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote

USPS Office Helper

Remote

Seasonal Driver Helper – Package Delivery Assistant with UPS – Immediate Openings Available

Remote

Billing Coordinator I (Healthcare Billing Speci...

Remote

Pega Senior System Architect

Remote

Functional Recruiter

Remote

Customer Success US BASED

Remote

First Stop Health – AR/AP Specialist – Chicago, IL

Remote

Senior Software Development Engineer in Test

Remote

Apply Now: Require (USA) Area Manager - Quality Assurance/Systems

Remote

Amazon Advertising Manager - PPC

Remote

**Experienced Full Stack Data Analyst – Transportation Examination Group**

Remote
← Back