Machine Learning Engineer, Inference

Remote Full-time
Machine Learning Engineer, Inference
Rime builds voice AI for enterprises running customer experiences at scale. Our text-to-speech models are purpose-built for high-volume conversational deployments, engineered for the pronunciation accuracy, latency, and deployment flexibility that production environments actually demand.
We started from a different premise than the rest of the field: voice AI isn’t bottlenecked by model architecture. It’s bottlenecked by data. So before we trained a single model, we built our own corpus: full-duplex, studio-quality conversational speech, recorded and annotated by PhD linguists. That’s our moat. It’s also why enterprises pick Rime when pilots need to convert into production.
We’re backed by top-tier investors including Unusual Ventures, and we’ve built a team at the intersection of product, research, and craft. Building voice models is an art. We intend to master it.

Role Overview
We’re hiring a Machine Learning Engineer to own inference for Rime’s models in production. Voice is unforgiving because every millisecond shows up in the conversation. You’ll build the systems that turn our models into the lowest-latency, highest-throughput, most reliable speech systems in the industry.

What You’ll Own
In-house real-time speech-first inference stack: model compilation, kernel optimization, batching strategy, streaming output, the path from checkpoint to first-audio-byte.

Latency systems: TTFB targets across regions, KV cache management, speculative decoding, scheduler design

Deployment flexibility: cloud, on-prem, BYOC (SageMaker, Connect), the packaging and runtime story across heterogeneous environments.

Inference for full- and half-duplex models, including streaming codec encoding and decoding

What We’re Looking For
Strong software engineering fundamentals: Rust, Python, C++/CUDA welcome, distributed systems, comfort across the stack.

Hands-on experience serving ML models at scale in production, ideally for low-latency or streaming workloads.

Deep familiarity with inference engines (vLLM, SGLang), SDKs (TensorRT, ONNX, CUDA Graphs, Triton), etc.

Working knowledge of speech synthesis and/or speech recognition techniques.

Familiarity with multiple speech representations (neural codecs, semantic tokens, mel/STFT) and how they shape inference cost.

Experience optimizing transformer or autoregressive model inference: KV caching, quantization, paged attention, speculative decoding.

Willing to roll up your sleeves on unglamorous performance work — flame graphs, NSight traces, kernel tuning, paired with the agency to build the abstractions so the team doesn’t stay stuck doing it by hand.

Bias toward shipping.

Nice to Have
CUDA kernel authoring or Triton experience.

GPU profiling and microarchitecture intuition (H100, A100, L40S, Blackwell).

Experience with parallel model training infrastructure

Multi-tenant inference scheduling and fairness.

Comfort working close to research teams and influencing model architecture choices for inference-friendliness.

Why Join Rime
Build the inference stack behind a category-defining voice AI company.

Direct collaboration with founders, including a CEO with a Stanford computational linguistics PhD who takes latency as seriously as you do.

The systems you build determine what experiences our customers can deploy.

Meaningful equity upside.

High ownership, high standards, low bureaucracy.

What We Offer
Competitive base + meaningful early-stage equity

Remote-friendly

Visa sponsorship available

Access to a proprietary, full-duplex, studio-quality conversational speech corpus

Compute and tooling to do the work

Direct influence on the future of voice AI

At Rime, we...
Are outliers

Cut through the hype to focus on the craft

Move fast with agency and freedom

Maintain a growth mindset, finding joy in the struggle

Do the right things, knowing that it'll lead to making money

If that sounds like you too, you'll be a great fit for Rime!

Apply To This Job
Apply Now →

Similar Jobs

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote

USPS Office Helper

Remote

Social Media Lead

Remote

**Experienced Customer Support Representative – Remote, Part-Time Opportunity at arenaflex**

Remote

Work from Home Prior Authorization Pharmacist- Must reside in California

Remote

Principal Account Executive - USA

Remote

**Experienced Part-Time Remote Data Entry Specialist for arenaflex - Flexible Hours, Competitive Pay**

Remote

Digital Messaging & Conversational AI Manager

Remote

Customer Service Agent - English Speaker (Part-Time / Night Shift)

Remote

Salesforce Admin / Developer - United States - Remote

Remote

UPS Remote Jobs (Data Entry Specialist) – Work From Home Job – US

Remote

Respiratory Therapist - Night Shift 6:30pm-7am SIGN ON BONUS - Full-time

Remote
← Back