Machine Learning Engineer, Data

Remote Full-time
Machine Learning Engineer, Data & Training Infrastructure
Rime builds voice AI for enterprises running customer experiences at scale. Our text-to-speech models are purpose-built for high-volume conversational deployments, engineered for the pronunciation accuracy, latency, and deployment flexibility that production environments actually demand.
We started from a different premise than the rest of the field: voice AI isn't bottlenecked by model architecture. It's bottlenecked by data. So before we trained a single model, we built our own corpus: full-duplex, studio-quality conversational speech, recorded and annotated by PhD linguists. That's our moat. It's also why enterprises pick Rime when pilots need to convert into production.
We're backed by top-tier investors including Unusual Ventures, and we've built a team at the intersection of product, research, and craft. Building voice models is an art. We intend to master it.

Role Overview
We're hiring a Machine Learning Engineer to own the operational data pipeline end-to-end. The role requires "T-shaped" expertise: depth in data and orchestration fundamentals, and the ability to coordinate everything that touches the data: from the upload interface, audio preprocessing (VAD, ASR), annotation, training data export, to evaluation.

What You'll Own
End-to-end audio annotation pipeline: Currently some stages exist as prototypes; productionizing and rebuilding them is work that’s currently in flight.

Quality systems: Automated tooling to catch annotation errors, alignment drift, and silent regressions before training runs.

Dataset versioning and experimenter tooling: the model team will want to subset the vetted pool ("speakers X/Y/Z, duration 3–12s, quality > 0.8") into reproducible training manifests. The query interface, manifest format, and lineage tracking are all yours.

Linguist- and annotation-team-facing tooling: annotation UI, PM workflow for project management, QC dashboards.

Pipelines for full- and half-duplex training data

What We're Looking For
Strong software engineering fundamentals: Python, distributed systems, comfort across the stack.

Database design fluency: you reach for the right schema and have operated Postgres or similar in production.

Production data pipelines on cloud-native infrastructure (GCP preferred). Our data stack is currently GCP-dominant.

Operational comfort: containers, CI/CD, IAM, cost-aware infrastructure choices, etc.

Strong attention to detail on data quality.

Comfort being out of your depth at the boundary. You'll sometimes debug code you didn't write in tools you don't use daily. You should find this energizing, not threatening.

Bias toward building the abstractions so the modeling team doesn't stay stuck doing data work by hand.

Nice to have
Multilingual data pipeline experience.

Audio DSP, signal processing, or speech recognition background.

Large-scale training infra (FSDP, DeepSpeed, Ray).

Annotation tooling and human-in-the-loop systems.

Comfort working close to research teams.

Why Join Rime
Build the data infrastructure behind a category-defining voice AI company.

The pipelines you build determine what models we can train.

Meaningful equity upside.

High ownership, high standards, low bureaucracy.

What We Offer
Competitive base + meaningful early-stage equity

Remote-friendly

Visa sponsorship available

Access to a proprietary, full-duplex, studio-quality conversational speech corpus

Compute and tooling to do the work

Direct influence on the future of voice AI

At Rime, we...
Are outliers

Cut through the hype to focus on the craft

Move fast with agency and freedom

Maintain a growth mindset, finding joy in the struggle

Do the right things, knowing that it'll lead to making money

If that sounds like you too, you'll be a great fit for Rime!

Apply To This Job
Apply Now →

Similar Jobs

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote

USPS Office Helper

Remote

Cybersecurity Professional

Remote

Remote Amazon Part Time Data Entry Jobs - Entry Level – USA Remote Jobs

Remote

**Experienced Remote Customer Assistance Specialist – Deliver Exceptional Service Experiences from Home**

Remote

Full‑Time Online Data Entry Associate for Teenagers – No Experience Required – Remote US Opportunity with arenaflex

Remote

[Remote] Instructional Designer/UKHC

Remote

**Experienced Full Stack Data Entry Specialist – Remote Work Opportunity at blithequark**

Remote

Retail Media Manager job at Revlon in New York, NY

Remote

Shipt Express Delivery Driver – Nebraska

Remote

Online Order Filling Team Associate

Remote

Associate, Risk and Portfolio Management

Remote
← Back