[Remote] QA Engineer, AI Products

Remote Full-time
Note: The job is a remote job and is open to candidates in USA. MDCalc is a leading medical reference tool used by clinicians worldwide, and they are seeking a QA Engineer to enhance their AI product team. This role focuses on ensuring the quality and reliability of AI-powered features, particularly in testing LLM-based systems, while collaborating with cross-functional teams to define quality metrics and testing strategies.

Responsibilities
• Design and execute test strategies for LLM-powered features, including prompt regression testing, output evaluation, and hallucination detection
• Build and maintain automated evaluation pipelines (eval sets, golden datasets, LLM-as-judge frameworks) to catch quality regressions in non-deterministic outputs
• Perform black-box and exploratory testing of MDCalc's AI features across web and mobile, with particular attention to clinical accuracy, safety, and edge cases
• Define quality metrics for AI outputs (accuracy, faithfulness, relevance, safety, latency, cost) and establish thresholds for release readiness
• Collaborate cross-functionally with engineers, product managers, ML/AI engineers, and clinical reviewers to define what 'good' looks like for AI responses
• Investigate and triage AI failure modes, distinguishing model issues, prompt issues, retrieval issues, and integration bugs
• Participate in team discussions, offering feedback on testability, risks, prompt design, and guardrails
• Help develop QA strategies to expand future testing capacity, automation, and evaluation coverage as the AI product surface grows

Skills
• 5+ years of experience in software QA, with at least 1 year of hands-on testing of LLM-based or AI/ML-powered features
• Strong understanding of QA principles, test case creation/documentation, and best practices for both deterministic and non-deterministic systems
• Hands-on experience with LLM tooling and concepts: prompt engineering, RAG systems, evaluation frameworks (e.g., Promptfoo, Braintrust, LangSmith, DeepEval, Ragas, OpenAI Evals), and LLM APIs (OpenAI, Anthropic, etc.)
• Experience designing automated qualitative evaluation approaches, including LLM-as-judge, rubric-based scoring, semantic similarity checks, and golden dataset regression testing
• Proficiency with test automation tools, with a focus on Playwright
• Strong SQL skills for data validation, test data creation, and verifying data integrity across systems
• Familiarity with token usage, latency profiling, and cost monitoring as quality signals
• Eagerness to learn quickly and a positive, solutions-oriented attitude
• Clear and concise communicator, able to surface issues, blockers, and risks effectively when communicating ambiguous or probabilistic failures
• Self-motivated, proactive, and able to manage time and priorities independently

Benefits
• Medical, Dental, & Vision Coverage, with option to extend to your dependents
• Company-sponsored short-term insurance
• Fully-paid 8 week parental leave, after 6 months of employment
• Company-sponsored 401k, after 3 months of employment
• Unlimited vacation for salaried roles - we trust you to take the time you need
• Bi-annual company offsites to connect, reflect, and plan together
• Work from home monthly stipend
• A culture of fun and motivated team members who believe in a greater mission here at MDCalc

Company Overview
• MDCalc is used by over 2/3 of US physicians, provides free and access to 800+ medical scores, calculations and algorithms. It was founded in 2005, and is headquartered in New York, New York, USA, with a workforce of 11-50 employees. Its website is https://www.mdcalc.com.

Apply To This Job
Apply Now →

Similar Jobs

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote

USPS Office Helper

Remote

Amazon Delivery Driver

Remote

Accounts Payable Lead (100% remote in Estonia)

Remote

Akelius Field Consultant for Una-Sana Canton, Bosnia and Herzegovina, Education and Skills section. 12 months, Remote (open for National consultants)

Remote

Disney Remote Data Entry Careers (Work At Home)

Remote

Experienced Live Chat Representative – Ecommerce Loyalty Program Support

Remote

Remote Part-Time Data Entry Clerk $33HR :::::: Immediate Need!

Remote

Yelp Spam Comment Removal Jobs (Part-Time) $28/Hour

Remote

[Remote] Backend Developer - AI Data Services

Remote

Full Stack Developer - Remote

Remote

URGENTLY HIRING IN HOME CAREGIVERS! - WEEKLY PAY!

Remote
← Back