[Remote] Forward Deployed Engineer: AI + HPC

Remote Full-time
Note: The job is a remote job and is open to candidates in USA. Cedana is a company focused on maximizing AI and HPC cluster utilization and reliability. As a Forward Deployed Engineer, you will lead technical engagements with customers, deploying Cedana's solutions in various environments and optimizing platform performance.ResponsibilitiesEngineer solutions at client sites: Lead customer integrations. Install, configure, and deploy Cedana into SLURM, Kubernetes, and Dynamo environmentsDrive product innovation from the field: Identify technical gaps while embedded with clients, then provide product feedback for new capabilities that become core product featuresMeasure and optimize platform performance: Measure reliability, throughput, and performance using our internal tools. Design and implement policy-based migration automations to optimize reliability, throughput, and performanceOwn critical deployments: Ensure our platform performs reliably for clients' critical operations, debugging issues across the full stack. Debug install issues against unfamiliar customer infrastructure, and escalate to engineering when necessaryImprove scalability : Build and own the internal installation playbook so that the second customer in each segment is onboarded faster than the firstRespect our customers : Understand how to make their lives easier and minimize their time and overheadSkillsTeam management experience. Requires strong project and time management skills, delivering milestones on time, and effective3-10 years of software engineering experience with a track record of configuring and managing SLURM deploymentsA multi-month enterprise or research deployment you led end-to-end, from scoping through signoff. You write effective status updates to keep your team updated and on scheduleProduction experience in standing up SLURM in a customer or research environment. You've configured slurmctld, slurmdbd, accounting, cgroup integration, and GPU resource selectionStrong Linux fundamentals of systemd, cgroups v2, namespaces, networking, filesystems, firewalls, kernel module loading, PAM session modules. You can read strace and dmesg output and form a hypothesisExperience with Kubernetes operations including operators, CRDs, CNIs, device plugins, and node-level debugging. You've debugged a controller in production even if you haven't written one from scratchExperience in an HPC integrator field teamClient-facing technical experience working directly with customersBackground in national lab user services or university research computingYou've developed SLURM plug-ins, and understand their architecture and how they fit into the overall platformFamiliarity with CRIU, container runtimes, GPU driver internals, distributed training stacksHands-on with NVIDIA Dynamo, Determined, Ray, Kueue, KServe, or comparable AI orchestrationContributed to open-source schedulers or job systems (SLURM, Flux, Torque, PBS)A passion for debugging a weird cgroup issue at 11pm just as much as writing a clean install playbook the next morningBenefits100% covered medical, dental, and vision insurance for employees and familiesUnlimited PTO policy401K PlanCompany OverviewCedana is VMWare for GPUs. We enable enterprises to orchestrate and operationalize intelligence precisely, reliably, and efficiently. It was founded in 2023, and is headquartered in New York, New York, USA, with a workforce of 2-10 employees. Its website is https://www.cedana.ai.

Apply Now →

Similar Jobs

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote

USPS Office Helper

Remote

Senior Technical Audit Analyst (Cork, Ireland)

Remote

**Experienced Virtual Data Entry Clerk – Remote Work Opportunity with arenaflex**

Remote

Remote Business Consultant at Love Your Life Marketing Inc. West Palm Beach, FL

Remote

Clinical Laboratory Technologist II - Pathology & Immunology

Remote

Senior Director, Sales Strategy and Operations

Remote

American Express Work From Home (Entry Level Job, College Level...

Remote

️ Keep Online Spaces Safe: Join Kroger as a Remote Content Moderator | $22/Hour – Part-Time

Remote

Online Teachers for AP US History

Remote

Staff Accountant - Audit Public/Private - New Grad 2026/2027 - Vancouver

Remote

Business Development Manager [Italian Speaker]

Remote
← Back