[Remote] Evaluation Engineer

Remote Full-time
Note: The job is a remote job and is open to candidates in USA. Elicit is an AI research platform that uses language models to help researchers make better decisions. The Evaluation Engineer will own the technical foundation of auto-evaluation systems, ensuring they are fast, reliable, and user-friendly while focusing on decision-making in pharma.ResponsibilitiesYou'll build a comprehensive system that runs fast, is easy to use, and supports quickly building new evals:You’ll build a lightning-fast basic evals infrastructure that schedules tasks to introduce practically no latency; and then you’ll figure out clever ways to solve the fundamental sources of latency (building a version of Elicit, running it on a query, and evaluating it using LMs)ML engineers need evals to kick off automatically on relevant commits, with results they can see at a glance and drill intoProduct managers need dashboards showing performance over time and what's going wrong in productionYour code must be well-architected so other team members and ML engineers can understand and build on itWe need to evaluate how well Elicit actually helps with decision-making in pharma, not just measure what's easy to measureThis requires encoding real knowledge about how pharma customers make decisions (for example, choosing appropriate gold standards)You'll provide appropriate statistical tests and confidence intervals so we can trust our resultsIn a typical month, expect to spend:60% working on the core eval platform15% working closely with the evals team to build and improve specific evals (e.g., an eval of our paper search within our systematic review flow)10% mentoring our evals engineering internThe rest on learning how people interact with the eval system so you can make it work better for them, and understanding what our users want from Elicit so evals measure what mattersSkillsAt least 3 years of experience as a professional software engineer, with demonstrated experience building complex backend systems (e.g., backend for a complex website, data pipelines, etc.)Aptitude and interest in evaluating how Elicit helps with pharma decision-making. There's no particular experience you must have, but we'll evaluate your aptitudeKnowledge of statistics (for e.g. calculating power and credence intervals for evals)Experience with advanced Python (asyncio/trio and parallel processing strategies)Front-end experience and strong UX sensibility (you'll be building dashboards). TypeScript experience is a plusExperience building developer tools (ML engineers are one of your most important clients)Previous experience as a data engineer or working on AI infrastructureKnowledge of pharma/biomedExperience evaluating ML systemsExperience building language-model-based systems (helps with understanding Elicit and how to evaluate it)BenefitsFlexible work environment: work from our office in Oakland or remotely with time zone overlap (between GMT and GMT-8), as long as you can travel for in-person retreats and coworking eventsFully covered health, dental, vision, and life insurance for you, generous coverage for the rest of your familyFlexible vacation policy, with a minimum recommendation of 20 days/year + company holidays401K with a 6% employer matchA new Mac + $1,000 budget to set up your workstation or home office in your first year, then $500 every year thereafter$1,000 quarterly AI Experimentation & Learning budget, so you can freely experiment with new AI tools, take courses, purchase educational resources, or attend AI-focused conferences and eventsA team administrative assistant who can help you with personal and work tasksCompany OverviewElicit uses language models to help users automate research workflows. It was founded in 2023, and is headquartered in Oakland, California, USA, with a workforce of 11-50 employees. Its website is https://elicit.com.

Apply Now →

Similar Jobs

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote

USPS Office Helper

Remote

Looking for Sugar Land Managerial Accounting Tutor in Sugar Land, TX

Remote

[Remote] Scheduling & Booking Specialist

Remote

Exciting Work From Home Opportunities at Amazon

Remote

Live, Virtual, Constructive (LVC) Engineer

Remote

Investor Relations Senior Manager

Remote

**Experienced Chat Representative – Remote Customer Support & Engagement**

Remote

Manager - Business Systems Administrator

Remote

Walgreen Remote Jobs Work From Home (Data Entry)

Remote

Campaigns Manager

Remote

Corporate Partnerships FLY:FWD Manager (Remote in IL)

Remote
← Back