Python Insfrastructure Engineer - Model Evaluation
Python Infrastructure Engineer — Model Evaluation (AI Training)
About The Role
What if your Python expertise could directly shape the systems that power next-generation AI models? We're looking for a senior Python engineer to design and build the data pipelines, evaluation harnesses, and annotation infrastructure that leading AI labs depend on to train and benchmark their models.
This is a high-impact, fully remote contract role working on real production systems — not toy projects. You'll collaborate directly with data, research, and engineering teams at the frontier of AI development.
• Organization: Alignerr
• Type: Hourly Contract
• Location: Remote
• Commitment: 20–40 hours/week
What You'll Do
• Design, build, and optimize high-performance Python systems supporting AI data pipelines and model evaluation workflows
• Develop full-stack tooling and backend services for large-scale data annotation, validation, and quality control
• Build and maintain evaluation harnesses that integrate with inference frameworks and benchmark AI model performance
• Improve reliability, performance, and safety across existing Python codebases
• Instrument systems with observability tooling — metrics, logging, and monitoring to track system reliability and model performance
• Identify bottlenecks and edge cases in data and system behavior, and implement scalable fixes
• Collaborate in synchronous design reviews to iterate on architecture and implementation decisions
Who You Are
• Native or fluent English speaker with strong written and verbal communication skills
• Full-stack developer with a solid systems programming background in Python
• 3–5+ years of professional experience writing production-grade Python
• Experienced building evaluation harnesses for ML models and integrating with inference frameworks
• Strong understanding of observability and metrics collection for monitoring system and model performance
• Able to commit 20–40 hours per week with reliability and focus
Nice to Have
• Prior experience with data annotation platforms, data quality systems, or evaluation pipelines
• Familiarity with AI/ML workflows, model training, or benchmarking infrastructure
• Experience with distributed systems or developer tooling at scale
• Background in MLOps, data engineering, or research engineering environments
Why Join Us
• Work on cutting-edge AI projects alongside leading research labs at the frontier of the field
• Fully remote and async-friendly — work from wherever you do your best work
• Freelance autonomy with the substance of meaningful, high-impact engineering work
• Make a direct, tangible contribution to the systems that shape how AI models are built and evaluated
• Potential for ongoing work and contract extension as new projects launch
Apply tot his job
Apply To this Job
About The Role
What if your Python expertise could directly shape the systems that power next-generation AI models? We're looking for a senior Python engineer to design and build the data pipelines, evaluation harnesses, and annotation infrastructure that leading AI labs depend on to train and benchmark their models.
This is a high-impact, fully remote contract role working on real production systems — not toy projects. You'll collaborate directly with data, research, and engineering teams at the frontier of AI development.
• Organization: Alignerr
• Type: Hourly Contract
• Location: Remote
• Commitment: 20–40 hours/week
What You'll Do
• Design, build, and optimize high-performance Python systems supporting AI data pipelines and model evaluation workflows
• Develop full-stack tooling and backend services for large-scale data annotation, validation, and quality control
• Build and maintain evaluation harnesses that integrate with inference frameworks and benchmark AI model performance
• Improve reliability, performance, and safety across existing Python codebases
• Instrument systems with observability tooling — metrics, logging, and monitoring to track system reliability and model performance
• Identify bottlenecks and edge cases in data and system behavior, and implement scalable fixes
• Collaborate in synchronous design reviews to iterate on architecture and implementation decisions
Who You Are
• Native or fluent English speaker with strong written and verbal communication skills
• Full-stack developer with a solid systems programming background in Python
• 3–5+ years of professional experience writing production-grade Python
• Experienced building evaluation harnesses for ML models and integrating with inference frameworks
• Strong understanding of observability and metrics collection for monitoring system and model performance
• Able to commit 20–40 hours per week with reliability and focus
Nice to Have
• Prior experience with data annotation platforms, data quality systems, or evaluation pipelines
• Familiarity with AI/ML workflows, model training, or benchmarking infrastructure
• Experience with distributed systems or developer tooling at scale
• Background in MLOps, data engineering, or research engineering environments
Why Join Us
• Work on cutting-edge AI projects alongside leading research labs at the frontier of the field
• Fully remote and async-friendly — work from wherever you do your best work
• Freelance autonomy with the substance of meaningful, high-impact engineering work
• Make a direct, tangible contribution to the systems that shape how AI models are built and evaluated
• Potential for ongoing work and contract extension as new projects launch
Apply tot his job
Apply To this Job