[Remote] Datacenter Hardware Operations Technician Lead, Industrial Compute

Remote Full-time
Note: The job is a remote job and is open to candidates in USA. OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. They are seeking a Datacenter Hardware Operations Technician Lead to serve as the senior on-site technical authority for hardware reliability and fleet health at one of OpenAI’s flagship AI campuses. The role involves driving technical triage and resolution of hardware issues, collaborating with various teams, and establishing operational standards for hardware maintenance.ResponsibilitiesServe as OpenAI’s senior on-site hardware operations lead for server, GPU, storage, and rack-level infrastructureDrive technical triage and resolution of complex hardware failures impacting production systemsPartner with Fleet Health Engineering to investigate recurring hardware issues, identify failure patterns, and improve fleet reliabilityLead root cause analysis (RCA) efforts for critical hardware incidents and develop corrective and preventive action plansCollaborate with Oracle operations teams and OEM vendors to coordinate repairs, replacements, upgrades, and hardware lifecycle activitiesEstablish and continuously improve hardware maintenance procedures, operational runbooks, and troubleshooting standardsAnalyze hardware failure trends and operational metrics to identify reliability risks and improvement opportunitiesSupport new hardware introductions, validation activities, and production readiness reviewsCoordinate spare parts strategy and inventory planning with supply chain and operations teamsPartner with Hardware Engineering, Manufacturing, and Infrastructure teams to provide field feedback that improves future platform designsDevelop scalable operational standards and best practices that can be deployed across future Stargate campusesMentor technicians and partner teams on advanced troubleshooting methodologies and hardware operational excellenceSkills8+ years of experience supporting large-scale datacenter hardware infrastructure, with experience in a senior technician, sustaining engineering, or hardware operations leadership roleDeep expertise with server platforms, GPU systems, storage infrastructure, rack integration, and datacenter hardware architectureStrong experience diagnosing complex hardware failures and leading repair efforts in production environmentsExperience conducting root cause analysis and driving long-term corrective actionsStrong understanding of hardware reliability engineering principles and fleet-health managementProven ability to partner effectively across engineering, operations, manufacturing, and vendor organizationsComfortable operating independently in high-priority production environments with significant operational responsibilityExcellent written and verbal communication skills with the ability to influence technical and operational decisionsExperience developing operational processes, maintenance standards, and technical documentationAbility to travel occasionally to support new campus deployments and operational readiness activitiesExperience supporting large-scale GPU clusters or AI/ML infrastructure environmentsFamiliarity with fleet health systems, telemetry platforms, and hardware monitoring toolsExperience with failure analysis methodologies such as FRACAS, RCCA, 5-Why, Fishbone, or FMEAKnowledge of Linux system administration and hardware validation workflowsExperience supporting hyperscale datacenter operations or HPC environmentsFamiliarity with server manufacturing, rack integration, or NPI-to-sustaining transitionsIndustry certifications such as CompTIA Server+, OEM hardware certifications, or equivalent experienceExperience applying Environmental Health and Safety (EHS) practices in mission-critical datacenter environmentsCompany OverviewOpenAI is an AI research and deployment company that develops advanced AI models, including ChatGPT. It is a sub-organization of OpenAI Foundation. It was founded in 2015, and is headquartered in San Francisco, California, USA, with a workforce of 1001-5000 employees. Its website is https://www.openai.com.

Apply Now →

Similar Jobs

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote

USPS Office Helper

Remote

[Remote-Position] Remote Contract Technical Recruiter - Virtual

Remote

4th/5th grade Special Education Teachers Opening | Arlington Heights, IL

Remote

Part-Time Remote Customer Support Specialist - Marketplace at careerzynith

Remote

[PART_TIME Remote] Require Chef Infra, Part-Time Instructor in

Remote

VDR Support Analyst

Remote

Enterprise Software Sales Executive - Big 4 Accounting Vertical

Remote

Flexible Research Assistant - Temporary Part-Time (Hiring Immediately)

Remote

Interior Designer | Residential Design (Remote Working Role)

Remote

Referral and Results Clerk RN (Remote – California or New York)

Remote

Clinical Operations Associate

Remote
← Back