[Remote] Senior Storage Production Engineer - DGX Cloud

Remote Full-time
Note: The job is a remote job and is open to candidates in USA. NVIDIA is leading the way in groundbreaking developments in Artificial Intelligence, High-Performance Computing, and Visualization. They are seeking a Senior Storage Production Engineer to design, implement, and support large-scale storage clusters while ensuring scalability, high availability, and data integrity.ResponsibilitiesDesign, implement, and support large-scale storage clusters, ensuring scalability, high availability, and data integrityDevelop and maintain storage monitoring, logging, and alerting systems to ensure proactive detection and resolution of performance issuesWork with AI/ML workloads to improve storage architectures for low-latency access, efficient caching, and high-throughput performanceImprove the lifecycle of storage services – from inception and design to deployment, operation, and continuous optimization. Support storage services before they become available through activities such as system build consulting, developing automation frameworks, capacity management, and launch reviewsMaintain production storage infrastructure by supervising availability, latency, and system health, leveraging predictive analytics and AI-driven automationOptimize storage efficiency through compression, deduplication, tiering strategies, and intelligent workload placementScale storage systems sustainably using AI/ML-driven automation, policy-based tiering, and dynamic data migration techniques. Ensure data security and compliance by implementing encryption, access controls, and auditing mechanisms for storage systemsPractice sustainable incident response and blameless root cause analysis. Be part of an on-call rotation to support storage and production systemsSkillsBS degree or equivalent experience in Computer Science, Storage Systems, or a related technical field with 8+ years of practical experienceExperience with distributed and high-performance storage solutions, including clustered and parallel file systems, distributed object storage, and enterprise-grade storage systemsSolid understanding of block, file, and object storage technologies, including their scalability, reliability, and performance characteristics and standard processesExperience with storage networking protocols such as NFS, SMB, iSCSI, S3, Fibre Channel, RDMA, and NVMe over FabricsExpertise in algorithms, data structures, complexity analysis, software design, and automating maintenance of large-scale Linux-based storage systemsExperience in one or more of the following: C/C++, Java, Python, Go, NodeJS, and Bash for storage automation, monitoring, and performance tuningHands-on experience with infrastructure configuration management tools like Ansible, Chef, Puppet, and Terraform for automating storage deploymentsExperience with observability and tracing tools like InfluxDB, Prometheus, Grafana, and the Elastic stack for monitoring storage system healthExcellent written and oral communication skills, excellent work ethics, a deep sense of teamwork, love to produce quality work and commitment to finishing your tasks every single dayDeep understanding of extensive distributed storage systems, replication strategies, and erasure coding techniquesExperience in capacity planning, performance tuning, and troubleshooting high-throughput storage systemsExperience with Git, code review, pipelines, and CI/CD for handling infrastructure as codeExperience in analyzing and improving distributed storage system performance at scaleStrong debugging skills with a systematic problem-solving approach to identify sophisticated storage issuesProven understanding of network protocols, architectures, and troubleshooting techniques, especially as it relates to storage performance, stability, and availabilityExperience using or operating private and public cloud storage solutions based on Kubernetes, OpenStack, or hybrid cloud architecturesAbility to design and implement automated storage migration, backup, and disaster recovery strategiesThrive in collaborative environments and enjoy working with various teams to optimize storage performanceFlexible in adapting to different working styles and emerging storage technologiesBenefitsYou will also be eligible for equity and [benefits](https://www.nvidia.com/en-us/benefits/).Company OverviewNVIDIA is a computing platform company operating at the intersection of graphics, HPC, and AI. It was founded in 1993, and is headquartered in Santa Clara, California, USA, with a workforce of 10001+ employees. Its website is https://www.nvidia.com.Company H1B SponsorshipNVIDIA has a track record of offering H1B sponsorships, with 448 in 2026, 1872 in 2025, 1354 in 2024, 976 in 2023, 835 in 2022, 601 in 2021, 529 in 2020. Please note that this does not guarantee sponsorship for this specific role.

Apply Now →

Similar Jobs

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote

USPS Office Helper

Remote

Warehouse Attendant

Remote

Experienced Data Entry Specialist – Virtual Remote Work Opportunity with careerzynith

Remote

**Experienced Customer Service Representative – Remote Customer Support at blithequark**

Remote

Experienced Full Stack Java Software Engineer – Customer Systems Development at careerzynith

Remote

Senior Abstractor, HEDIS/Quality Improvement (Remote)

Remote

Total Loss Claims Adjuster I,II or Sr.

Remote

[Remote] Senior Workday Architect - Healthcare Consulting

Remote

Remote: HR Retirement Senior Manager (Full Time/Work from Home)

Remote

Real Estate Virtual Assistant (US)

Remote

Social Media Specialist

Remote
← Back