[Remote] Sr. Engineering Manager, MLOps

Remote Full-time
Note: The job is a remote job and is open to candidates in USA. Quince is a tech company disrupting the retail industry by leveraging AI, analytics, and automation. They are seeking a Senior Engineering Manager, MLOps to build and scale the infrastructure that supports production-grade Machine Learning, ensuring seamless operations for their Data Scientists and AI Researchers.ResponsibilitiesDefine the MLOps Vision & Strategy: Architect a long-term roadmap that transitions ML workflows from manual scripts to a fully automated, self-service platform for all Quince Data Scientists and AI ResearchersOwn the "Paved Road" for Production: Build and maintain the end-to-end infrastructure for model training, deployment, and serving, ensuring researchers can move from "idea to production" with zero frictionDrive Strategic Prioritization: Partner with business leaders to align infrastructure investments with core e-commerce drivers like real-time personalization, dynamic pricing, and inventory forecastingLead "Build vs. Buy" Evaluations: Make high-judgment decisions on when to leverage cloud-native services (e.g., SageMaker, Vertex AI) versus building custom internal tools to optimize for cost, speed, and flexibilityGuarantee System Scalability & Reliability: Oversee the uptime and performance of production ML services, ensuring the stack can handle massive traffic surges and seasonal spikes without degradationManage Compute Governance & Costs: Direct the optimization of high-cost computational resources, such as GPU clusters and cloud instances, balancing high-performance training needs with fiscal responsibilityRecruit and Mentor Top Talent: Build and lead a high-performing team of ML Infra and DevOps engineers, providing technical coaching, career pathing, and performance managementEstablish MLOps Standards: Drive the adoption of best practices in CI/CD for ML, Infrastructure as Code (IaC), and automated testing to ensure a modular and maintainable systemBridge the Research-Engineering Gap: Act as the primary cross-functional lead, translating the complex needs of AI Researchers into actionable engineering requirements for the infrastructure teamDefine and Track Velocity Metrics: Establish KPIs for the infrastructure team, such as model deployment frequency, mean time to recovery (MTTR), and infrastructure cost per inferenceChampion Operational Excellence: Lead root-cause analyses (RCAs) for production failures and foster a culture of accountability where systemic fixes are prioritized over "quick patches."Stay Ahead of the AI Curve: Monitor emerging trends in LLM-ops, vector databases, and real-time feature engineering to ensure Quince’s infrastructure remains competitive and future-proofSkills10+ years of industry experience, with at least 3-5 years in a leadership or management role specifically focused on ML Infrastructure, MLOps, or large-scale Data Platform engineeringProven track record of building and scaling MLOps platforms that support the full model lifecycle—from data ingestion and distributed training to real-time inference and monitoringDeep technical expertise in cloud-native infrastructure (preferably AWS) and orchestration tools like Kubernetes (EKS), Docker, and Infrastructure as Code (Terraform/Pulumi)Hands-on experience with ML frameworks and tooling, such as PyTorch, TensorFlow, Kubeflow, or SageMaker, and a strong opinion on how to integrate them into a cohesive developer experienceExpertise in building and managing Feature Stores and high-throughput data pipelines (using tools like Spark, Flink, or Kafka) to ensure data consistency across training and servingExperience partnering with AI Research and Data Science teams to understand their unique workflows and translate research needs into robust, scalable engineering solutionsStrong understanding of CI/CD for ML, including automated testing for models, model versioning, and 'blue-green' or 'canary' deployment strategiesDemonstrated ability to manage high-cost compute resources, with experience optimizing GPU utilization and cloud spend in a hyper-growth environmentExcellence in operational leadership, with a history of driving service availability, performance, and stability through rigorous on-call rotations and root-cause analysisA product-oriented mindset, with the ability to treat infrastructure as a platform and prioritize the roadmap based on researcher velocity and business ROIExceptional communication and influence skills, capable of navigating ambiguity and building consensus across engineering, product, and data science leadershipKindness and high standards: You move fast and push for excellence, but you do so as a supportive team player who fosters a culture of psychological safety and extreme candorBenefitsBonus and equity may also be provided for eligible rolesCompany OverviewQuince is an e-commerce company that offers apparel, accessories, home goods, and personal care products through an online platform. It was founded in 2018, and is headquartered in San Francisco, California, USA, with a workforce of 1001-5000 employees. Its website is https://www.quince.com.Company H1B SponsorshipQuince has a track record of offering H1B sponsorships, with 1 in 2023. Please note that this does not guarantee sponsorship for this specific role.

Apply Now →

Similar Jobs

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote

USPS Office Helper

Remote

Senior Investigator, Aetna SIU (Must reside in Ohio)

Remote

Strategic Industry Advisor for Space Weather & Satellite Systems Consultant

Remote

Field Sales Representative (Aerospace and Defense)

Remote

[Part/Full-Time] Call Center Agent For Student - Amazon Remote Career

Remote

Forklift Technician - Equipment Mechanic

Remote

Data Analyst, Good Jobs Innovation Lab

Remote

Digital Media Specialist - Paid Media (Temporary Contract) (f/m/d)

Remote

Proposal Writer, Enterprise Marketing

Remote

CNA | Skilled Nursing | Brendan House | 3 12hr/wk, Days

Remote

HR People Consultant- GTM

Remote
← Back