Member of Technical Staff (Data Scientist, Evals)

Remote Full-time
Perplexity serves tens of millions of users daily with reliable, high-quality answers grounded in an LLM-first search engine and our specialized data sources. We aim to use the latest models as they are released, but the intelligence frontier is a jagged one, and popular benchmarks do not effectively cover our use cases. In this role, you will build specialized evals to improve answer quality across Perplexity, covering search-based LLM answers and other scenarios popular with our users.

Responsibilities
• Architect and maintain automated evaluation pipelines to assess answer quality across Perplexity's products, ensuring high standards for accuracy and helpfulness
• Design evaluation sets and methods specifically to measure the impact of tool calls (particularly web search retrieval) on the final answer's quality
• Develop VLM-based solutions to programmatically evaluate how final answers render visually across different platforms and devices
• Continuously review public benchmarks and academic evaluations for their applicability to the Perplexity product, adapting and incorporating them into our regular performance measurements
• Operate within a small, high-impact team where your evaluation metrics directly shape product changes, collaborating closely with technical leadership to measure and improve Answer Quality

Qualifications
• PhD or MS in a technical field or equivalent experience
• 4+ years of experience in data science or machine learning
• Strong proficiency in Python and SQL (expected to write production-grade code)
• Experience building within a modern cloud data stack, specifically AWS and Databricks
• Comfortable with agentic coding workflows and using AI-assisted development tools to iterate faster

Preferred Qualifications
• 1+ years of experience working with LLMs at scale, specifically with LLM-as-a-judge setups
• Prior experience working on customer-facing web products or consumer apps, with real user traffic at scale
• A strong research background, with experience applying research methods to real-world ML problems
• Experience defining evaluation metrics (e.g., factual consistency, hallucination rate, retrieval precision) and building ground truth datasets

Apply tot his job

Apply To this Job
Apply Now →

Similar Jobs

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote

USPS Office Helper

Remote

Experienced Work from Home Inbound Customer Service Representative – Flexible Part-Time Opportunity with blithequark

Remote

Senior Executive Search Consultant

Remote

Complex Claims Manager

Remote

Customer Service Executive in Accenture - Job Vacancy Near Me ...

Remote

Experienced Full Stack Customer Support Specialist – Home Advisor Role at careerzynith

Remote

Seasonal HEDIS Abstractor I/II

Remote

VP Talent Acquisition

Remote

Community Relations & Foundations Specialist

Remote

Experienced Level 1 Chat Support Agent – Deliver Exceptional Customer Experiences at careerzynith

Remote

Experienced Full Stack Data Entry Specialist – Remote Work Opportunity with careerzynith

Remote
← Back