[Remote] Senior Machine Learning Operations Engineer
Note: The job is a remote job and is open to candidates in USA. Paramount is on a mission to unleash the power of content and is seeking a Senior Machine Learning Operations Engineer to oversee the operational layer of their ML systems. The role involves ensuring model traceability, building monitoring systems, and collaborating with Data Engineering to maintain data quality and reliability.ResponsibilitiesOwn model traceability: Every model in production should have clear lineage: what data trained it, what code produced it, what validation it passed, and how it's performing. Evaluate and recommend tooling for versioning, metadata, and model registry, and work with MLEs to drive adoptionBuild end-to-end monitoring: Monitor the full signal path: data arrival, feature distribution stability, model metrics, and serving latency against SLA. Own this individually, don't rely solely on upstream teams to catch their own issuesPartner with Data Engineering on data quality: Collaborate to surface data quality issues, detect drift in upstream sources, and ensure features stay fresh and reliableDetect issues proactively: Track drift over weeks, flag slow degradation before it crosses a threshold, surface feature freshness problems before they cascadeBuild diagnostic tooling: When something goes wrong, get from "recommendations look off" to root cause in minutes. That means ensuring the right context is logged at each stage, candidates, features, serving context, and building the dashboards to tie it collectivelyOwn incident response for ML systems: Maintain rollback playbooks and pre-defined hotfix strategies with quantified tradeoffs. Own automated gates that block bad deployments. Run post-mortems and close the gapsCoordinate on post-deployment metrics: Work with ML engineers, data engineers, and stakeholders to define what metrics to collect after deployment and why they matterSkills5+ years in ML engineering, applied ML, or a related ML role, with demonstrated experience on the operational side of monitoring, reliability, deployment, or incident responseHas built or operated model registries, ML monitoring systems, or production ML pipelinesUnderstands ML systems end-to-end — not just the infra layer, but why a stale feature or a shifted distribution mattersRobust SQL skills and comfort digging into data distributions, feature health, and model behaviorComfortable partnering with DevOps and Platform teams to define infrastructure needs without needing to own the infra yourselfExperience operating recommendation or personalization systems at scaleBenefitsMedicalDentalVision401(k) planLife insurance coverageDisability benefitsTuition assistance programPTOThis position is bonus eligible.Generous paid time off.Opportunities for both on-site and virtual engagement events.Unique opportunities to make meaningful connections and build a vibrant community, both inside and outside the workplace.Company OverviewParamount is a leading media and entertainment company that creates premium content and experiences for audiences worldwide. It was founded in 1914, and is headquartered in New York, New York, USA, with a workforce of 10001+ employees. Its website is https://www.paramount.com.Company H1B SponsorshipParamount has a track record of offering H1B sponsorships, with 2 in 2024. Please note that this does not guarantee sponsorship for this specific role.