[Remote] Research Engineer — Post-Training & Small Language Models (SLMs), Healthcare AI

Remote Full-time
Note: The job is a remote job and is open to candidates in USA. Deloitte is leading an AI-first initiative aimed at transforming the healthcare decision-making process through advanced modeling and reasoning systems. As a Research Engineer, you will design, train, and evaluate models that enhance clinical and operational decision-making, focusing on post-training methodologies and ensuring model behavior aligns with healthcare standards.ResponsibilitiesDesign and execute post-training pipelines: supervised fine-tuning (SFT), preference optimization, and reinforcement learning / alignment workflowsBuild and optimize training using techniques such as SFT, RLHF, PPO, DPO, GRPO, RLAIF, and Constitutional AI, and understand how each affects reasoning quality, safety, latency, cost, and reliabilityTrain reasoning models for healthcare decisioning using verifiable-reward RL - designing reward signals and verifiers grounded in clinical guidelines, policy and criteria, and adjudicated outcomesDevelop reward models and preference datasets to improve reasoning quality, factuality, safety, policy adherence, and task performanceCurate, clean, synthesize, and evaluate large-scale instruction, preference, and domain-specific datasets, with rigorous filtering, deduplication, and quality controlBuild verification and reward pipelines from our proprietary clinical, claims, and operational data and from clinical-expert labeling - turning guidelines, policy, and adjudicated outcomes into checkable reward signals at scaleImplement efficient fine-tuning strategies including LoRA, QLoRA, PEFT, and adapter-based approaches; build scalable distributed training using DeepSpeed, FSDP, Megatron-LM, Ray, or equivalentOptimize inference performance - latency, throughput, quantization, and deployment efficiency - for production, including frameworks such as vLLM, TensorRT-LLM, or TGITrain and optimize open-weight models such as Llama, Qwen, Mistral, or DeepSeek; build specialized small language models (SLMs) for on-premise and cloud-hybrid deployment with strong performance-per-dollarDesign evaluation frameworks covering reasoning, hallucination detection, factuality, instruction following, structured outputs, and domain-specific metricsBuild healthcare-grade evaluation - held-out clinical benchmarks, deployment regression gates, calibration and uncertainty, factuality against ground truth, and bias/fairness evaluation across patient populations and subgroups - co-designed with clinical expertsApply PHI/HIPAA-aware data handling and produce model documentation suitable for regulated clinical usePerform red teaming and adversarial testing to identify alignment failures, unsafe behaviors, jailbreak vulnerabilities, and regression risks; collaborate with agentic and application teams to improve tool use, grounding, and long-horizon reasoningSkillsBachelor's degree in Computer Science, Machine Learning, Artificial Intelligence, Applied Mathematics, Computational Linguistics, or a related fieldDemonstrated depth training and post-training large transformer-based language models in production or research - this is your craft, not coursework or a one-off fine-tune. Genuine depth including SFT and at least one preference-optimization or RL method, evidenced by shipped models, releases, or researchHands-on experience with reasoning-model training and/or verifiable-reward (RLVR) workflowsStrong understanding of modern post-training techniques: SFT, RLHF, PPO, DPO, GRPO, RLAIF, and preference optimization workflowsExperience with open-weight foundation models such as Llama, Qwen, Mistral, DeepSeek, or equivalent architecturesStrong expertise in PyTorch and modern deep-learning tooling; experience with distributed training frameworks such as DeepSpeed, FSDP, Megatron-LM, or RayExperience implementing efficient fine-tuning techniques such as LoRA, QLoRA, PEFT, and quantization-aware workflowsDeep understanding of transformer architectures, tokenization, attention mechanisms, decoding strategies, and model scaling trade-offsStrong grasp of LLM evaluation methodologies, benchmarking, reward modeling, and alignment trade-offs; experience with large-scale and synthetic datasets, filtering, deduplication, and quality-control pipelinesStrong Python engineering skills and production-grade software practices; ability to work through ambiguous, highly complex technical problems in fast-moving environmentsAbility to travel 0-50%, on average, based on the work you do and the clients and industries/sectors you serveLimited immigration sponsorship may be availableExperience building or optimizing reasoning models, agentic models, or tool-using LLM systemsFamiliarity with inference optimization frameworks such as vLLM, TensorRT-LLM, TGI, or OllamaExperience with multimodal models, speech models, or domain-specific foundation models; experience using large-scale GPU clusters and distributed computeContributions to open-source AI projects, research publications, benchmark development, or model releasesFamiliarity with safety, governance, and responsible-AI practices; experience in regulated or high-stakes industries such as healthcare, finance, insurance, or public sectorBenefitsSubstantial performance-based incentive opportunity designed to grow with the value you help create - startup-style upside, with the backing of a committed, well-capitalized platformYou may also be eligible for a discretionary annual incentive based on individual and organizational performanceLimited immigration sponsorship may be availableAbility to travel 0-50%, on average, based on the work you do and the clients and industries/sectors you serveCompany OverviewDeloitte drives progress. Our firms around the world help clients become leaders wherever they choose to compete. It was founded in 2008, and is headquartered in Arlington, Virginia, USA, with a workforce of 10001+ employees. Its website is https://www.clearcarboninc.com.Company H1B SponsorshipDeloitte has a track record of offering H1B sponsorships, with 1055 in 2026, 6871 in 2025, 4911 in 2024, 5604 in 2023, 8090 in 2022, 5993 in 2021, 10388 in 2020. Please note that this does not guarantee sponsorship for this specific role.

Apply Now →

Similar Jobs

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote

USPS Office Helper

Remote

Advisory Board, Fiber Infrastructure (Chicago)

Remote

Hiring Now: Data Engineer (Remote)

Remote

Part-time Nabisco Merchandiser

Remote

Entry-Level Remote Data Entry Clerk for Apple Inc. - Part-Time Work from Home Opportunity with Competitive Salary and Benefits

Remote

Remote Legal Transcriptionist- Hartford, CT

Remote

[Remote] Sales Cloud Developer

Remote

**Experienced Remote Research Panelist – Flexible Opportunities with arenaflex**

Remote

**Experienced Virtual Assistant/ Data Entry Specialist – Part-Time Remote Opportunity at arenaflex**

Remote

**Experienced Remote Customer Service Representative – Deliver Exceptional Client Experiences with arenaflex**

Remote

Amazon Flex Delivery Driver - Earn $15.50 - $18.50/hr

Remote
← Back