Deep Learning Software Engineer, Inference and Model Optimization - New College Grad 2025

Remote Full-time
About the position

NVIDIA is at the forefront of the generative AI revolution! The Algorithmic Model Optimization Team specifically focuses on optimizing generative AI models such as large language models (LLM) and diffusion models for maximal inference efficiency using techniques ranging from neural architecture search and pruning to sparsity, quantization, and automated deployment strategies. Our work includes conducting applied research to improve model efficiency as well as developing an innovative software platform (TRT Model Optimizer). Our software is used both internally across NVIDIA and externally by research and engineering teams alike developing best-in-class AI models. We are now looking for a Deep Learning Software Engineer to develop and scale up our automated inference and deployment solution. As part of the team, you will be instrumental in pushing the limits of inference efficiency and large-scale, automated deployment. Your work will touch upon fundamental aspects of a typical machine learning stack including working in high-level frameworks like PyTorch and HuggingFace to developing and improving high-performance kernel implementations in CUDA, TRT-LLM, and Triton.

Responsibilities
• Train, develop, and deploy state-of-the generative AI models like LLMs and diffusion models using NVIDIA's AI software stack.
• Leverage and build upon the torch 2.0 ecosystem (TorchDynamo, torch.export, torch.compile, etc...) to analyze and extract standardized model graph representation from arbitrary torch models for our automated deployment solution.
• Develop high-performance optimization techniques for inference, such as automated model sharding techniques (e.g. tensor parallelism, sequence parallelism), efficient attention kernels with kv-caching, and more.
• Collaborate with teams across NVIDIA to use performant kernel implementations within our automated deployment solution.
• Analyze and profile GPU kernel-level performance to identify hardware and software optimization opportunities.
• Continuously innovate on the inference performance to ensure NVIDIA's inference software solutions (TRT, TRT-LLM, TRT Model Optimizer) can maintain and increase its leadership in the market.
• Play a pivotal role in architecting and designing a modular and scalable software platform to provide an excellent user experience with broad model support and optimization techniques to increase adoption.

Requirements
• Masters, PhD, or equivalent experience in Computer Science, AI, Applied Math, or related field.
• Experience in Deep Learning.
• Excellent software design skills, including debugging, performance analysis, and test design.
• Strong proficiency in Python, PyTorch, and related ML tools (e.g. HuggingFace).
• Strong algorithms and programming fundamentals.

Nice-to-haves
• Contributions to PyTorch, JAX, or other Machine Learning Frameworks.
• Knowledge of GPU architecture and compilation stack, and capability of understanding and debugging end-to-end performance.
• Familiarity with NVIDIA's deep learning SDKs such as TensorRT.
• Experience in writing high-performance GPU kernels for machine learning workloads in frameworks such as CUDA, CUTLASS, or Triton.

Benefits
• Highly competitive salaries
• Comprehensive benefits package
• Equity opportunities

Apply Now

Apply Now
Apply Now →

Similar Jobs

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote

USPS Office Helper

Remote

Remote Pharmacy Technician

Remote

Remote Consultant – Healthcare Social Worker | WFH

Remote

Join Today: Work From Home Amazon Data Entry Jobs - No Experience

Remote

Virtual Nurse Acute Care / Registered Nurse

Remote

Software Implementation Consultant

Remote

Senior Data Engineer - Data Ops

Remote

Urgently Hiring: ABARTA Coca-Cola - HRIS Analyst

Remote

Senior SEO Specialist

Remote

Apply Now: Delta Airlines Support Job On Phone, Email, Social

Remote

Mgr, Health Service (IC) - DRG Coder Reviewer - State Medicaid, MMPs (HIDE/FIDE/DUALS) - Aetna Medical Policy & Program Solutions

Remote
← Back