[Remote] Senior Machine Learning Engineer, AI Platform
Note: The job is a remote job and is open to candidates in USA. Mozilla Corporation is a non-profit-backed technology company that aims to improve the internet. They are seeking a Senior Machine Learning Engineer to design, build, and operate their AI platform, focusing on model training pipelines and secure AI systems.ResponsibilitiesDesign, build, and operate core AI platform components used to train, deploy, and serve machine learning models in production environmentsOwn model serving and inference workflows end-to-end, driving improvements in reliability, scalability, performance, and operational excellenceLead efforts to optimize inference systems for throughput, latency, and cost efficiency across CPU and GPU workloadsDesign and manage GPU-based inference and training workloads, including performance tuning, capacity planning, and resource utilization optimizationOwn and improve critical parts of the model lifecycle, including packaging, versioning, testing strategies, validation, and deployment automationImplement and evolve observability practices (metrics, logging, tracing, alerting) to improve visibility and operational resilience of ML services and pipelinesPartner closely with product, infrastructure, security, and data teams to design scalable platform capabilities that enable AI-powered featuresContribute to technical design discussions, propose architectural improvements, and mentor junior engineers through code reviews and knowledge sharingParticipate in and help improve operational processes, including incident response, on-call rotations, and post-incident reviewsSkillsBachelor's degree with 4–6 years of relevant industry experience, or Master's degree with significant hands-on experience building and operating production ML systems, or work experience equivalentStrong experience developing in Python for machine learning systems, backend services, or distributed data processingProven experience deploying and operating ML workloads in cloud environments, including production-grade infrastructureSolid understanding of model serving architectures, inference pipelines, and performance tradeoffs (latency, throughput, cost, scaling strategies)Hands-on experience working with GPU-based workloads and accelerated computing in production settingsExperience designing CI/CD pipelines and development workflows that support reliable ML system deploymentAbility to independently scope and drive technical initiatives while balancing product and operational prioritiesStrong problem-solving skills and the ability to debug performance and reliability issues in distributed systemsClear and effective communication skills, with experience collaborating across engineering, product, and infrastructure teamsExperience implementing inference optimization strategies such as batching, quantization, compilation, model conversion, or hardware-specific tuningFamiliarity with containerization and orchestration systems (e.g., Docker, Kubernetes) in production environmentsExperience designing observability systems for distributed services, including metrics strategy and performance profilingExposure to privacy-preserving ML techniques, security best practices, or responsible AI system designContributions to open-source ML infrastructure projects or leadership in building reusable internal ML toolingBenefitsGenerous performance-based bonus plans to all eligible employees - we share in our success as one teamRich medical, dental, and vision coverageGenerous retirement contributions with 100% immediate vesting (regardless of whether you contribute)Quarterly all-company wellness days where everyone takes a pause togetherCountry specific holidays plus a day off for your birthdayOne-time home office stipendAnnual professional development budgetQuarterly well-being stipendConsiderable paid parental leaveEmployee referral bonus programOther benefits (life/AD&D, disability, EAP, etc. - varies by country)Company OverviewMozilla provides internet solutions and offers firefox, thunderbird, and raindrop. It was founded in 1998, and is headquartered in Mountain View, California, USA, with a workforce of 501-1000 employees. Its website is https://www.mozilla.org.Company H1B SponsorshipMozilla has a track record of offering H1B sponsorships, with 2 in 2025, 5 in 2024, 4 in 2023, 3 in 2022, 2 in 2021, 6 in 2020. Please note that this does not guarantee sponsorship for this specific role.