[Remote] AI/ML Research Engineer, LLM Post-Training & Evaluation
Note: The job is a remote job and is open to candidates in USA. Innodata Inc. is a leading data engineering company providing AI technology solutions to major technology firms and industries. They are seeking an AI/ML Research Engineer to build and optimize technical foundations for model improvement, focusing on large language models and evaluation systems.
Responsibilities
⢠Lead or co-lead technically complex ML engineering projects from initial customer discussions through implementation and delivery
⢠Design, build, and improve LLM training and post-training pipelines, including data ingestion, preprocessing, fine-tuning, evaluation, and experiment tracking
⢠Implement and optimize evaluation systems for LLMs and multimodal models, including offline benchmarks and task-specific test harnesses
⢠Integrate human-in-the-loop and AI-augmented evaluation signals into model development workflows
⢠Build robust infrastructure and tooling for reproducible experimentation, metrics logging, and regression monitoring
⢠Diagnose model behavior and pipeline failures, including data issues, training instability, metric inconsistencies, and evaluation drift
⢠Collaborate with Language Data Scientists and Applied Research Scientists to translate evaluation frameworks into executable systems
⢠Work closely with customer technical stakeholders to understand goals, constraints, and success criteria; propose and implement technically sound solutions
⢠Contribute to internal research and platform development, including benchmark frameworks, evaluation tooling, and post-training workflow improvements
⢠Contribute to best practices and standards for LLM training, evaluation, and quality assurance across projects
⢠Mentor junior engineers and contribute to technical design reviews, documentation, and engineering rigor across the team
Skills
⢠BS/MS/PhD in Computer Science, Machine Learning, AI, Applied Mathematics, or a related quantitative technical field (MS/PhD preferred)
⢠2-3 years of relevant industry or research engineering experience in ML/AI systems
⢠Hands-on experience with LLM training / fine-tuning / post-training, including at least one of: supervised fine-tuning (SFT), preference optimization (e.g., DPO or related methods), RLHF / RLAIF-style workflows, task- or domain-adaptation of foundation models
⢠Strong programming skills in Python and experience building production-quality ML code
⢠Experience with modern ML frameworks (e.g., PyTorch, JAX, TensorFlow) and model libraries/tooling (e.g., Hugging Face ecosystem, vLLM, distributed training stacks)
⢠Experience designing and implementing evaluation pipelines for LLM/ML systems, including metrics computation, dataset handling, and experiment comparisons
⢠Strong understanding of data pipelines and ML systems engineering, including reproducibility, observability, and debugging
⢠Experience with large-scale distributed ML systems and performance optimization for training/evaluation workloads (GPU/accelerator environments preferred)
⢠Experience with large-scale data processing and workflow orchestration in support of model training/evaluation
⢠Ability to collaborate directly with technical stakeholders including research scientists, ML engineers, data engineers, and customer technical leads
⢠Strong written and verbal communication skills, including the ability to explain complex technical tradeoffs to both technical and non-technical audiences
⢠Experience training, fine-tuning, and evaluating transformer-based models
⢠Understanding of post-training workflows and model iteration loops
⢠Familiarity with inference-time considerations (latency, throughput, memory/performance tradeoffs) where relevant to evaluation or deployment
⢠Experience implementing automated evaluation pipelines and test harnesses
⢠Experience with experiment tracking, versioning, and reproducibility practices
⢠Ability to assess metric quality and ensure consistency across model comparisons
⢠Proficiency in Python and strong software engineering fundamentals
⢠Experience with data processing pipelines, storage formats, and scalable dataset workflows
⢠Familiarity with CI/CD, testing, and engineering quality practices for ML systems
⢠Experience with multimodal model training/evaluation (text + image/audio/video)
⢠Experience with long-context evaluation and/or model adaptation for long-context tasks
⢠Experience with agentic or multi-turn evaluation harnesses, tool-use simulation, or interactive environment testing
⢠Experience working in customer-facing technical consulting, solutions engineering, or applied research delivery
⢠Familiarity with LLM safety, alignment, robustness, or red-teaming evaluation approaches
⢠Contributions to open-source ML/LLM tooling or published technical work in relevant areas
Company Overview
⢠(NASDAQ: INOD) Innodata is a global data engineering company. We believe that data and AI are inextricably linked. It was founded in 1988, and is headquartered in Hackensack, New Jersey, USA, with a workforce of 5001-10000 employees. Its website is http://www.innodata.com.
Company H1B Sponsorship
⢠Innodata Inc. has a track record of offering H1B sponsorships, with 2 in 2024. Please note that this does not guarantee sponsorship for this specific role.
Apply tot his job
Apply To this Job
Responsibilities
⢠Lead or co-lead technically complex ML engineering projects from initial customer discussions through implementation and delivery
⢠Design, build, and improve LLM training and post-training pipelines, including data ingestion, preprocessing, fine-tuning, evaluation, and experiment tracking
⢠Implement and optimize evaluation systems for LLMs and multimodal models, including offline benchmarks and task-specific test harnesses
⢠Integrate human-in-the-loop and AI-augmented evaluation signals into model development workflows
⢠Build robust infrastructure and tooling for reproducible experimentation, metrics logging, and regression monitoring
⢠Diagnose model behavior and pipeline failures, including data issues, training instability, metric inconsistencies, and evaluation drift
⢠Collaborate with Language Data Scientists and Applied Research Scientists to translate evaluation frameworks into executable systems
⢠Work closely with customer technical stakeholders to understand goals, constraints, and success criteria; propose and implement technically sound solutions
⢠Contribute to internal research and platform development, including benchmark frameworks, evaluation tooling, and post-training workflow improvements
⢠Contribute to best practices and standards for LLM training, evaluation, and quality assurance across projects
⢠Mentor junior engineers and contribute to technical design reviews, documentation, and engineering rigor across the team
Skills
⢠BS/MS/PhD in Computer Science, Machine Learning, AI, Applied Mathematics, or a related quantitative technical field (MS/PhD preferred)
⢠2-3 years of relevant industry or research engineering experience in ML/AI systems
⢠Hands-on experience with LLM training / fine-tuning / post-training, including at least one of: supervised fine-tuning (SFT), preference optimization (e.g., DPO or related methods), RLHF / RLAIF-style workflows, task- or domain-adaptation of foundation models
⢠Strong programming skills in Python and experience building production-quality ML code
⢠Experience with modern ML frameworks (e.g., PyTorch, JAX, TensorFlow) and model libraries/tooling (e.g., Hugging Face ecosystem, vLLM, distributed training stacks)
⢠Experience designing and implementing evaluation pipelines for LLM/ML systems, including metrics computation, dataset handling, and experiment comparisons
⢠Strong understanding of data pipelines and ML systems engineering, including reproducibility, observability, and debugging
⢠Experience with large-scale distributed ML systems and performance optimization for training/evaluation workloads (GPU/accelerator environments preferred)
⢠Experience with large-scale data processing and workflow orchestration in support of model training/evaluation
⢠Ability to collaborate directly with technical stakeholders including research scientists, ML engineers, data engineers, and customer technical leads
⢠Strong written and verbal communication skills, including the ability to explain complex technical tradeoffs to both technical and non-technical audiences
⢠Experience training, fine-tuning, and evaluating transformer-based models
⢠Understanding of post-training workflows and model iteration loops
⢠Familiarity with inference-time considerations (latency, throughput, memory/performance tradeoffs) where relevant to evaluation or deployment
⢠Experience implementing automated evaluation pipelines and test harnesses
⢠Experience with experiment tracking, versioning, and reproducibility practices
⢠Ability to assess metric quality and ensure consistency across model comparisons
⢠Proficiency in Python and strong software engineering fundamentals
⢠Experience with data processing pipelines, storage formats, and scalable dataset workflows
⢠Familiarity with CI/CD, testing, and engineering quality practices for ML systems
⢠Experience with multimodal model training/evaluation (text + image/audio/video)
⢠Experience with long-context evaluation and/or model adaptation for long-context tasks
⢠Experience with agentic or multi-turn evaluation harnesses, tool-use simulation, or interactive environment testing
⢠Experience working in customer-facing technical consulting, solutions engineering, or applied research delivery
⢠Familiarity with LLM safety, alignment, robustness, or red-teaming evaluation approaches
⢠Contributions to open-source ML/LLM tooling or published technical work in relevant areas
Company Overview
⢠(NASDAQ: INOD) Innodata is a global data engineering company. We believe that data and AI are inextricably linked. It was founded in 1988, and is headquartered in Hackensack, New Jersey, USA, with a workforce of 5001-10000 employees. Its website is http://www.innodata.com.
Company H1B Sponsorship
⢠Innodata Inc. has a track record of offering H1B sponsorships, with 2 in 2024. Please note that this does not guarantee sponsorship for this specific role.
Apply tot his job
Apply To this Job