Member of Engineering – Pre-training, Data Engineering

Remote Full-time
Job Description:
• Build and maintain high-performance pipelines for trillions of tokens.
• Deliver diverse and high quality datasets for pre-training foundation models.
• Closely work with other teams such as Pretraining, Posttraining, Evals and Product to to ensure alignment on the quality of the models delivered.

Requirements:
• Strong background in building production-grade, distributed data systems for machine learning, with experience in:
• Orchestration: Slurm, Airflow, or Dagster
• Observability & Reliability: CI/CD, Grafana, Prometheus, etc.
• Infra: Git, Docker, k8s, cloud managed services
• Batched inference (ex: vLLM)
• Performance obsession, especially with large-scale GPU clusters and distributed pipelines
• Expert-level python knowledge and ability to write clean and maintainable code
• Strong algorithmic foundations
• Proficiency with libraries like Polars, Dask, or PySpark
• Nice to have:
• Experience in building trillion-scale SOTA pretraining datasets
• Experience translating research to production at scale
• Experience with OCR, web crawling, or evals
• Prior experience pre-training LLMs

Benefits:
• Fully remote work & flexible hours
• 37 days/year of vacation & holidays
• Health insurance allowance for you and dependents
• Company-provided equipment
• Wellbeing, always-be-learning and home office allowances
• Frequent team get togethers
• Great diverse & inclusive people-first culture

Apply To This Job
Apply Now →

Similar Jobs

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote

USPS Office Helper

Remote

Medication Refill Center Pharmacist-Per Diem

Remote

**Experienced Data Entry Specialist – Remote Part-Time Opportunity at arenaflex**

Remote

IT Field Support Specialist – Imaging (HT1) (Government) Columbia, Maryland

Remote

Temporary Certified Nursing Assistant – Amazon Store

Remote

SyteLine Implementation / Solutions Architect

Remote

Sales Execution Coordinator

Remote

Joules Ireland Sales Manager

Remote

**Experienced and Entry-Level Friendly Part-Time Walmart Data Entry Associate – Retail Operations and Inventory Management**

Remote

Experienced Customer Service Representative – Remote Work Opportunity with arenaflex, Delivering Exceptional Financial Solutions and Services

Remote

Interior Design Supervisor

Remote
← Back