[Remote] Principal Machine Learning Engineer, ML Platform

Remote Full-time
Note: The job is a remote job and is open to candidates in USA. Shippo is on a mission to make every merchant successful through excellent shipping and logistics technology. They are seeking a Principal Machine Learning Engineer for their ML Platform to build a standardized, production-grade ML platform that enhances model reliability and speeds up product development.

Responsibilities
• Set technical strategy and drive a multi-quarter roadmap for ML platform capabilities aligned to Shippo’s business priorities
• Own cross-team architecture decisions, RFCs, and design reviews for ML lifecycle and inference
• Raise the engineering bar through mentorship, production readiness standards, and reusable platform primitives
• Be accountable for platform adoption, reliability, and cost-performance outcomes
• Build and operate core ML platform components:
• + ML lifecycle foundation (experiment tracking, reproducibility, artifact management, model registry, versioning, and controlled promotion workflows using MLflow or equivalent)
• + Training and experimentation enablement (standardized environments, reusable pipelines/templates, evaluation harnesses, and repeatable workflows that let data scientists move from exploration to production with confidence)
• + Kubernetes-native model serving for real-time inference (safe rollout and rollback, autoscaling, reliability practices, and cost controls)
• + Batch inference and scoring pipelines (repeatable backfills, retraining triggers, consistent packaging between training and inference)
• + Observability for ML systems (service health metrics, alerting, and model-quality signals such as drift and data quality)
• + Developer experience (templates, reference implementations, documentation, and self-service workflows)
• Evaluate and recommend inference frameworks and deployment patterns, and document tradeoffs for Shippo’s workloads
• Identify and resolve performance bottlenecks across the inference stack (model runtime, compute utilization, networking, serialization, and autoscaling behavior)
• Establish ML engineering standards across training, evaluation, testing, model packaging, CI/CD, production readiness, and incident response
• Partner with Data Science teams to bridge research and production environments by creating repeatable frameworks, shared standards for code quality and reproducibility, and self-serve paths to deploy models safely
• Collaborate with Data and Engineering teams to ensure the platform supports real workflows, drives adoption, and meets reliability expectations
• Mentor engineers through design reviews, architecture guidance, and shared best practices across platform and ML development

Skills
• 15+ years of software engineering experience, including ownership of production systems (platform, infrastructure, or distributed systems)
• 4+ years owning ML systems end-to-end in production, including on-call and incident response, and making architecture decisions based on operational constraints (latency, throughput, availability, and cost)
• Strong experience building and running services on Kubernetes, including deployments, autoscaling, and observability
• Hands-on experience with ML lifecycle tooling such as MLflow or equivalent (tracking, registry, packaging, and promotion workflows)
• Demonstrated ability to evaluate inference tradeoffs across batch and real-time serving, CPU versus GPU, latency and throughput, cost, and operational complexity
• Demonstrated Principal-level technical leadership, including setting technical direction, driving cross-team alignment via RFCs/design reviews, and delivering multi-quarter roadmaps
• Proven ownership of reliability and operational outcomes for production systems (SLOs, incident response, and measurable improvements in stability and performance)
• Demonstrated ability to ship incrementally, prioritize production reliability over perfect solutions, and drive adoption through pragmatic platform design
• Experience working with or evaluating managed ML platforms (Databricks, SageMaker, Vertex AI, or similar), with clear judgement on strengths, limitations, and build-vs-buy decisions
• Databricks experience (useful, not required), including Databricks workflows and ML tooling integration
• Experience with inference and serving frameworks
• Experience with feature store patterns, online and offline consistency, and model evaluation at scale
• Experience supporting optimization systems and decision engines in production
• LLM or agent workflow experience, especially evaluation harnesses, deployment patterns, guardrails, and monitoring

Benefits
• Healthcare coverage for medical, dental, and vision (90% covered by the company, incl. dependents). Pets coverage is also avail
Apply Now →

Similar Jobs

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote

USPS Office Helper

Remote

Apply Now: $125k Sign-On Bonus | | Hybrid | Flexible Schedule

Remote

CDL Driver - Class B

Remote

Data Entry Operations Specialist – Remote Seasonal Role – High‑Volume Order Processing & Excel Expertise

Remote

**Experienced Full-Time Remote Data Entry Clerk – Healthcare Claims Processing**

Remote

Work From Home Customer Service

Remote

[Remote] Financial Data Application Specialist, Corporate

Remote

Customer Service Representative

Remote

Netflix (Data Entry) Remote Jobs $65000/Year

Remote

Clinical Auditor-IBR

Remote

senior manager, Brand Marketing (Remote - U.S.)

Remote
← Back