[Remote] QA Engineer, AI Products

Remote Full-time

Note: The job is a remote job and is open to candidates in USA. MDCalc is a leading medical reference widely used by clinicians to improve patient outcomes. They are seeking a QA Engineer to ensure the quality and reliability of their AI-powered features, focusing on testing LLM-based systems and collaborating with cross-functional teams.ResponsibilitiesDesign and execute test strategies for LLM-powered features, including prompt regression testing, output evaluation, and hallucination detectionBuild and maintain automated evaluation pipelines (eval sets, golden datasets, LLM-as-judge frameworks) to catch quality regressions in non-deterministic outputsPerform black-box and exploratory testing of MDCalc's AI features across web and mobile, with particular attention to clinical accuracy, safety, and edge casesDefine quality metrics for AI outputs (accuracy, faithfulness, relevance, safety, latency, cost) and establish thresholds for release readinessCollaborate cross-functionally with engineers, product managers, ML/AI engineers, and clinical reviewers to define what 'good' looks like for AI responsesInvestigate and triage AI failure modes, distinguishing model issues, prompt issues, retrieval issues, and integration bugsParticipate in team discussions, offering feedback on testability, risks, prompt design, and guardrailsHelp develop QA strategies to expand future testing capacity, automation, and evaluation coverage as the AI product surface growsSkills5+ years of experience in software QA, with at least 1 year of hands-on testing of LLM-based or AI/ML-powered featuresStrong understanding of QA principles, test case creation/documentation, and best practices for both deterministic and non-deterministic systemsHands-on experience with LLM tooling and concepts: prompt engineering, RAG systems, evaluation frameworks (e.g., Promptfoo, Braintrust, LangSmith, DeepEval, Ragas, OpenAI Evals), and LLM APIs (OpenAI, Anthropic, etc.)Experience designing automated qualitative evaluation approaches, including LLM-as-judge, rubric-based scoring, semantic similarity checks, and golden dataset regression testingProficiency with test automation tools, with a focus on PlaywrightStrong SQL skills for data validation, test data creation, and verifying data integrity across systemsFamiliarity with token usage, latency profiling, and cost monitoring as quality signalsEagerness to learn quickly and a positive, solutions-oriented attitudeClear and concise communicator, able to surface issues, blockers, and risks effectively when communicating ambiguous or probabilistic failuresSelf-motivated, proactive, and able to manage time and priorities independentlyBenefitsMedical, Dental, & Vision Coverage, with option to extend to your dependentsCompany-sponsored short-term insuranceFully-paid 8 week parental leave, after 6 months of employmentCompany-sponsored 401k, after 3 months of employmentUnlimited vacation for salaried roles - we trust you to take the time you needBi-annual company offsites to connect, reflect, and plan togetherWork from home monthly stipendA culture of fun and motivated team members who believe in a greater mission here at MDCalcCompany OverviewMDCalc is used by over 2/3 of US physicians, provides free and access to 800+ medical scores, calculations and algorithms. It was founded in 2005, and is headquartered in New York, New York, USA, with a workforce of 11-50 employees. Its website is https://www.mdcalc.com.

Apply Now →

[Remote] QA Engineer, AI Products

Similar Jobs

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

USPS Office Helper

Staff Full Stack Engineer (Dapps Experience)

Senior Azure Cloud Engineer (.NET + AI Tools)

Experienced Data Entry Specialist – Remote Opportunity with careerzynith

Solutions Consulting Director/ Supply Chain Planning Technology Pre-Sales /remote/

Project Manager I (CRO or Life Sciences) - Remote

Starbucks

VP, Product Manager, Marketplace API and Digital Tools

Senior Dutch Prompt Engineer - Remote, Part-Time; NL

Sr Cybersecurity Analyst

Part-Time Editorial Associate