Python Developers - US

Remote Full-time
Work Location: Remote, within the US

Engagement Model: Freelancer/Independent Contractor

Start Date: ASAP

DataForce by TransPerfect is looking for skilled Python Developers to architect, build, and own the data pipelines that power large language model (LLM) development.

Your primary mission will be to build scalable, automated systems that transform massive raw datasets into clean, model-ready formats. While your focus will be on data engineering, your expertise will also be valuable in collaborating on model training runs and experiments.

You are a strong fit for this role if you are a Python expert who thrives on solving large-scale data challenges and enjoys working at the intersection of data engineering and machine learning.

Role Responsibilities
β€’ Design, develop, and own robust, scalable, and automated ETL/ELT pipelines in Python to ingest and process terabyte-scale text datasets.
β€’ Implement rigorous data cleaning, deduplication, filtering, and normalization strategies, and define and enforce data quality standards to ensure high integrity for model training.
β€’ Efficiently structure and format diverse datasets (e.g., JSON, Parquet) for consumption by LLM training frameworks.
β€’ Work closely with AI researchers and ML engineers to understand data requirements, define metrics, and support the model training lifecycle.
β€’ Continuously optimize data processing workflows for performance, cost efficiency, and reliability.
β€’ Occasionally assist with launching, monitoring, and debugging data-related issues during model training runs.

Role Requirements
β€’ 5–10 years of professional experience in Python development, data engineering, data processing, or backend software engineering.
β€’ Expert-level proficiency in Python and its data ecosystem (e.g., Pandas, NumPy, Dask, Polars).
β€’ Proven experience building and maintaining large-scale data pipelines.
β€’ Deep understanding of data structures, data modeling, and software engineering best practices (Git, CI/CD, testing).
β€’ Experience handling and parsing diverse data formats (JSON, CSV, XML, Parquet) at scale.
β€’ Excellent problem-solving skills and a meticulous attention to detail.
β€’ Strong communication and collaboration skills, with experience working in a team environment.

Preferred Role Requirements
β€’ Hands-on experience with the data preprocessing pipeline for an LLM (e.g., LLaMA, BERT, GPT-family).
β€’ Experience with big data frameworks like Apache Spark or Ray.
β€’ Experience with Hugging Face libraries (Transformers, Datasets, Tokenizers).
β€’ Familiarity with ML frameworks like PyTorch or TensorFlow.
β€’ Proficiency with cloud platforms (AWS, GCP, Azure) and their data/storage services.

DataForce by TransPerfect is part of the TransPerfect family of companies, the world’s largest provider of language and technology solutions for global business, with offices in more than 100 cities worldwide.

We offer high-quality data for Human-Machine Interaction to some of the most prestigious technology companies in the world. Our department focuses on gathering, enriching and processing data for Machine Learning in different AI domains. To learn more about DataForce please visit us at https://www.transperfect.com/dataforce.

TransPerfect provides equal employment opportunity to all individuals regardless of their race, color, creed, religion, gender, age, sexual orientation, national origin, disability, veteran status, or any other characteristic protected by state, federal, or local law. For more information on the TransPerfect Family of Companies, please visit our website at www.transperfect.com.

Remote

About the Company:
DataForce by TransPerfect

Apply Now

Apply Now
Apply Now β†’

Similar Jobs

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote

USPS Office Helper

Remote

Hospital Sales Specialist - Little Rock, AR

Remote

Associate, ML Data Operations, GO-AI Operations – Amazon Store

Remote

Software Engineer III - Platform Data (Remote, ROU)

Remote

Experienced Data Entry Specialist – Remote Opportunity with careerzynith

Remote

Analyst Corporate Development and Private Investments

Remote

Behavioral Health Provider (O-6 Billet) Supervisory

Remote

Senior Data Engineer

Remote

Part-Time Evening Data Entry Specialist for E-Commerce Operations at blithequark

Remote

Sales Director (Remote | 10k+ per Month Potential)

Remote

Regional Sales Manager, SoCal

Remote
← Back