[Remote] AI Evaluations Engineer
Note: The job is a remote job and is open to candidates in USA. Ellipsis Health is a health technology company seeking an AI Evaluations Engineer to join their AI Evaluation team. The role involves building infrastructure and tooling for AI system evaluations, developing evaluation frameworks, and improving developer experience.ResponsibilitiesBuild and maintain infrastructure and tooling for the AI evaluations platform used by internal teams, including automated testing platform for AI voice agents, debugging and observability toolsDevelop and productionalize evaluation frameworks for individual system components such as ASR, LLMs, TTS, knowledge bases, and guardrailsPartner with ML, engineering and QA teams to translate evaluation requirements into robust, maintainable infrastructure and toolingImprove developer experience by making evaluation systems easy to extend, well-documented, and reliable in day-to-day useEnsure evaluation tooling meets production standards for reliability, performance, and maintainabilitySkills5+ years of professional software engineering experience, with a strong focus on building backend systems, platforms, or developer toolingProven experience designing and maintaining production-grade infrastructure with code, including APIs, services, and data pipelinesStrong proficiency in at least one general-purpose programming language (e.g., Python, Typescript/Javascript, Java, or similar)Experience using test automation frameworks, evaluation pipelines, or CI/CD-integrated testing systemsFamiliarity with observability and debugging tools (logging, metrics, tracing) and building internal tools that improve developer and QA workflowsStrong debugging skills and a methodical approach to diagnosing production and evaluation issuesAbility to collaborate effectively across engineering, QA, and operations teams, translating requirements into reliable, maintainable systemsProduct-minded approach to infrastructure, with attention to usability, documentation, and long-term maintainabilityExperience working with complex, multi-component systems (e.g., ASR, LLMs, TTS, or other ML-powered services)Experience working in healthcare or other regulated environments, including awareness of HIPAA and PHI handlingFamiliarity with conversational AI or voice agents, including multi-turn dialogue, latency constraints, and error recoveryFamiliarity with LLM observability or evaluation tools (e.g., Langfuse, prompt eval frameworks)Background in digital health, care coordination, or patient-facing systemsBenefits401(k) matchingHealth, vision, and dental insuranceVery flexible paid time offCompany OverviewAI Nursing Care Manager It was founded in 2017, and is headquartered in San Francisco, California, USA, with a workforce of 11-50 employees. Its website is http://www.ellipsishealth.com.Company H1B SponsorshipEllipsis Health has a track record of offering H1B sponsorships, with 2 in 2026, 6 in 2025, 1 in 2024, 2 in 2023, 1 in 2021. Please note that this does not guarantee sponsorship for this specific role.