Software Engineer L5, Model Observability & Lifecycle Management, Machine Learning Platform

Remote Full-time
Netflix is one of the world’s leading entertainment services with 283 million paid memberships in over 190 countries enjoying TV series, films and games across a wide variety of genres and languages. Members can play, pause and resume watching as much as they want, anytime, anywhere, and can change their plans at any time.

The Role

The Model Observability & Lifecycle Management team’s centralized MLOps platform multiplies the productivity of both the Machine Learning Platform (MLP) organization and all ML practitioners across Netflix. We maintain the reliability of ML applications by building systems to catch and diagnose issues as soon as possible, sometimes before they even happen!

We’re building a comprehensive and centralized system for managing ML models, featuring capabilities like visualization, observability, and performance benchmarking. Our paved path for MLOps will reduce redundancy, minimize operational overhead, and offer standardized workflows and UIs to researchers and infrastructure engineers throughout the company.

We seek strong engineers to develop and expand our model observability and visualization workflows to support bandits, multi-task learning models, Large Language Models (LLMs), and other foundation models. Our tools and systems support and enable 100s of ML practitioners to develop some of Netflix’s most business-critical models across personalization, growth and commerce, ads, and studio algorithms. You will play a highly cross-functional role, partnering with other engineers, product managers, machine learning engineers, and data/research scientists to elevate our ML/AI initiatives and drive impactful innovation.

Snapshot Of Projects You May Work On
• Observability dashboard and corresponding backend system to integrate with various MLP products to enable ML practitioners to explore and discover ML entities (models, features, embeddings, pipelines, etc.) and monitor and operate them effectively
• Model registry to catalog ML models and their versions to enable discoverability, including core model store functionality with an API backend and an SDK integration layer
• Collaborate with cross-functional teams to implement anomaly and drift detection on models, features, embeddings, etc., automatically detecting and alerting on staleness and quality issues and suggesting or implementing fixes
• Cost monitoring and chargeback dashboards to provide visibility into resource utilization and identify opportunities for efficiency improvements
• Enhance our user interfaces to provide intuitive and seamless experiences for ML practitioners, incorporating feedback and best practices to improve usability and adoption.

We Would Love To Work With You If
• You have experience building backend distributed systems and full-stack systems using object-oriented programming (preferably Java), web API frameworks (preferably Spring Boot), and UI frameworks like React.
• You are experienced working with the public cloud like AWS, Azure, or GCP.
• You have knowledge of ML model lifecycle management and MLOps best practices to support end-to-end development, deployment, and monitoring of ML models.
• You proactively communicate with cross-functional teams to drive projects and promote best practices in observability and logging.
• You have a BS/MS in Computer Science, Applied Math, Engineering, or a related field.

Our compensation structure consists solely of an annual salary; we do not have bonuses. You choose each year how much of your compensation you want in salary versus stock options. To determine your personal top-of-market compensation, we rely on market indicators and consider your specific job family, background, skills, and experience to determine your compensation in the market range. The range for this role is $100,000 - $720,000K

We are an equal-opportunity employer and celebrate diversity, recognizing that diversity of thought and background builds stronger teams. We approach diversity and inclusion seriously and thoughtfully. We do not discriminate on the basis of race, religion, color, ancestry, national origin, caste, sex, sexual orientation, gender, gender identity or expression, age, disability, medical condition, pregnancy, genetic makeup, marital status, or military service.

Job is open for no less than 7 days and will be removed when the position is filled.

Apply Now

Apply Now →

Similar Jobs

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote

USPS Office Helper

Remote

Depth Test & Calibration Engineer

Remote

**Experienced Data Entry Specialist – Flexible Remote Work Opportunity with blithequark**

Remote

Bookkeeper (Remote)

Remote

Senior Systems Validation Engineer

Remote

CRM & Email Marketing Specialist (ESP / Lifecycle Operations) + $300 Sign-On Bonus!

Remote

Licensed Crisis Counselor - Fully Remote in Baltimore, MD

Remote

**Remote Data Entry Specialist – arenaflex Work from Home Opportunity ($30/hour)**

Remote

Business Data Partner Manager - Shockwave Medical

Remote

Experienced Remote Customer Service Representative – Aviation Industry Leader

Remote

Full Time Sales Associate Plumbing Day (Remote)

Remote
← Back