Founding ML infrastructure Engineer

Remote Full-time
The problem we saw
Most AI infrastructure is built for batch: send a query, wait, get a response, reset. Powerful, but transactional. AI is becoming interactive — sessions that hold state, models that stay alive between turns, generation that responds as it runs — and the infrastructure to deliver that at scale doesn't really exist yet.
The bottleneck isn't the models anymore. It's the infrastructure underneath them.
What we're building to fix it
uRun is the inference cloud for interactive AI: the compute layer that makes real-time, stateful inference possible at scale. We came out of stealth in April 2026, are backed by top-tier investors, and are founded by Keegan McCallum, who scaled inference infrastructure for some of the most demanding generative AI workloads in production.
We're an infrastructure company. We build the layer that model labs, builders, and research teams ship on top of.
Where you come in
We are building the next generation of AI inference infrastructure. As our ML Infrastructure and Platform Engineer, you will own the architecture and scaling of our GPU compute platform from the ground up.
This is a founding technical hire with end-to-end ownership across the full infrastructure stack, from bare metal to model serving. You will work directly with the founding team and define how we build.

What you'll actually be doing day-to-day
Design and scale our GPU compute platform to support 1,000+ GPU clusters, ensuring high availability and low-latency inference across the fleet

Build and maintain the infrastructure layer for our compute marketplace, including multi-tenant scheduling, isolation, and billing-aware resource allocation

Own production reliability for ML systems end-to-end: observability, incident response, and SLA achievement across model serving and infrastructure

Architect feature stores and model registry systems that support rapid iteration and reproducibility at scale

Design an experiment tracking infrastructure capable of handling thousands of concurrent runs with full auditability

Build resource orchestration and scheduling systems that optimise for throughput, cost, and latency across heterogeneous hardware

Set engineering standards for infrastructure reliability, capacity planning, and operational excellence as an early technical leader

What skills you need for the journey
Proven experience designing and operating large-scale distributed infrastructure at 1,000+ nodes or equivalent complexity, in any domain

Deep expertise in distributed systems, cluster orchestration (Kubernetes, Slurm, or custom schedulers), and large-scale resource scheduling

Strong production reliability instincts: observability, incident response, capacity planning, and SLA ownership across complex systems

Experience building infrastructure that other engineers build on top of, not just operating it

Ability to operate as a technical lead: set direction, make tradeoffs under uncertainty, and raise the bar for the team around you

Startup orientation. You are energised by ambiguity, move fast, and build for scale from day one

Things that will give you an edge
Exposure to ML infrastructure concepts: GPU networking (NCCL, InfiniBand, RoCE), model serving frameworks (vLLM, SGLang, TensorRT-LLM), or hardware-aware performance tuning (CuTe, Triton, TileLang)

Experience with multi-cloud GPU procurement and capacity management across AWS, GCP, Azure, and bare metal providers

Familiarity with inference marketplace architectures, dynamic routing, or spot/preemptible workload management

Prior experience at a Series A or earlier stage company scaling from early infrastructure to production

What you'll get in return
Competitive salary and meaningful equity in an early-stage AI infrastructure company. The band above is our target; for an exceptional candidate we'll go higher. Equity is real — you're early, and the grant reflects that.
Health, dental, and vision — full coverage

401(k) — company-supported retirement savings

FSA/HSA — flexible spending accounts for healthcare costs

Paid time off — we trust you to manage your time

Top-tier tooling — access to the best AI tools available: Claude, Codex, Kimi, and whatever else helps you move faster

MacBook Pro and AirPods — the hardware you need, on us

How we work (and what that feels like day-to-day)
We build the stage, not the show. We're an infrastructure company, a developer-tools company, and a production partner for model labs — and focus is a deliberate choice we've made and hold to.
Day-to-day, that means a small team, a high bar, and real ownership. You won't wait for permission or inherit a backlog of someone else's decisions. In a founding infrastructure role, the function is what you make it.
It also means ambiguity: priorities shift, not everything is documented, and you'll often be the person who decides what "good enough for now" means. That suits some people and not others, and we'd rather you know that before you apply.
Watch our launch party video
Read the manifesto
Follow us on LinkedIn
Follow us on X

Apply To This Job
Apply Now →

Similar Jobs

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote

USPS Office Helper

Remote

**Experienced Customer Service Representative – Delivering Exceptional Support in a Dynamic Remote Environment**

Remote

Remote Administrative Coordinator – Entry Level | Full-Time Opportunity+

Remote

Immediate Hiring: Phone Sales - Fully Remote

Remote

Sales Support Specialist

Remote

Experienced Remote Data Entry Specialist – Accurate Information Processing and Entry for Operational Excellence at blithequark

Remote

Construction Contracts Manager-Remote

Remote

Experienced Data Entry Specialist – Disney Career Job (Remote)

Remote

Senior Business Analyst

Remote

Looking for Assistant Librarian for Instruction & Online Learning in Laredo, TX

Remote

Senior Underwriter - Seneca Specialty Package Ohio

Remote
← Back