Senior Machine Learning Engineer - Hardware Abstractions & Performance Optimization

Remote Full-time
Luma’s mission is to build multimodal AI to expand human imagination and capabilities. We believe that multimodality is critical for intelligence. To go beyond language models and build more aware, capable and useful systems, the next step function change will come from vision. So, we are working on training and scaling up multimodal foundation models for systems that can see and understand, show and explain, and eventually interact with our world to effect change.We are looking for engineers with significant experience maintaining & designing highly efficient systems and code that can be optimized to run on multiple hardware platforms, bringing our state-of-the-art models to as many people at the best performance per dollar.ResponsibilitiesEnsure efficient implementation of models & systems with a focus on designing, maintaining, and writing abstractions that scale beyond NVIDIA/CUDA hardware.Identify and remedy efficiency bottlenecks (memory, speed, utilization, communication) by profiling and implementing high-performance PyTorch code, deferring to Triton or similar kernel-level languages as necessary.Benchmarking our products across a variety of hardware & software to help the product team understand the optimal tradeoffs between latency, throughput and cost at various degrees of parallelism.Work together with our partners to help them identify bottlenecks and push forward new iterations of hardware and software.Work closely together with the rest of the research team to ensure systems are planned to be as efficient as possible from start to finish and raise potential issues for hardware integration.Must have experienceExperience optimizing for memory, latency and throughput in Pytorch.Bonus: experience with non-NVIDIA systemsExperience using torch.compile / torch.XLA.Experience benchmarking and profiling GPU & CPU code in Pytorch for optimal device utilization (examples: torch profiler, memory profilers, trace viewers, custom tooling).Experience building tools & abstractions to ensure models run optimally on different hardware and software stacks .Experience working with transformer models and attention implementations.Experience with parallel inference, particularly with tensor parallelism, pipeline parallelism.Good to have experienceExperience with high-performance Triton/CUDA and writing custom PyTorch kernels and ops. Top candidates will be able to write fused kernels for common hot paths, understand when to make use of lower level features like tensor cores or warp intrinsics, and will understand where these tools can be most impactful.Experience writing high-performance parallel C++. Bonus if done within an ML context with PyTorch, like for data loading, data processing, inference codeExperience building inference / demo prototype code (incl. Gradio, Docker etc.)

Apply Now
Apply Now →

Similar Jobs

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote

USPS Office Helper

Remote

**Experienced Customer Support Chat Representative – Remote Work Opportunity with arenaflex**

Remote

**Experienced Part-Time Remote Data Entry Clerk – Flexible Schedule, Competitive Pay, and Global Opportunities**

Remote

[Remote] Software Engineer

Remote

Cloud Engineer - Office of Government-wide Policy

Remote

Experienced Part-Time Remote Customer Service Representative – Delivering Exceptional Air Travel Experiences with blithequark

Remote

English Teacher is Needed for Tutoring Online. Your Own Schedule

Remote

**Experienced Work from Home Customer Service Representative – Delivering Exceptional Customer Experiences in a Dynamic Remote Environment**

Remote

Immediate Hiring: Entry-Level Data Entry Clerk – careerzynith

Remote

Experienced Remote Data Entry Research Panelist - Flexible Part-Time or Full-Time Work from Home Opportunity at careerzynith

Remote

**Remote Live Chat Specialist – Entry-Level Opportunity at blithequark**

Remote
← Back