Part-Time Benchmarking Engineer (Remote)

Remote Full-time

Join Our Team and Shape the Future of AI Benchmarking! We're seeking a skilled and motivated Part-Time Benchmarking Engineer to play a key role in maintaining and improving our public LLM benchmarks. As a remote team member, you'll enjoy the freedom and flexibility to work from anywhere, with a competitive salary and the opportunity to grow with our company.
About the Role: As a Benchmarking Engineer, you will be responsible for creating new datasets, running benchmarks against new models, and analyzing results to provide valuable insights. You will have significant ownership of our benchmarking site and the opportunity to propose new benchmarks based on your ideas and hypotheses.
Key Responsibilities:

Create new, private datasets in conjunction with our data annotators and partner groups
Run existing benchmarks against new models and compile results
Write free-text analyses of raw quantitative results to answer key questions about model performance
Create social media posts to share our findings with the community
Maintain and improve scripts used to run benchmarks against our datasets

Requirements:

Deep experience with Python
Strong communication and writing skills
Experience working in teams, including development sprints and Git
Availability of approximately 20 hours per week

Nice to Haves:

Familiarity with LLM methods and developments
Experience in ML research setting or data science

About Us: At Vals AI, we're building the enterprise benchmark for LLM and LLM apps on real-world business tasks. Our mission is to create the infrastructure and certification to automatically audit LLM applications, verifying they are ready for consumption. We're a team of talented individuals with a passion for AI and a drive to make a difference.
What We Offer:

Competitive salary
Optional ability to work in our SF office
Opportunity to grow into a full-time role
Collaborative and dynamic work environment

How to Apply: If you're a motivated and skilled individual with a passion for AI, we encourage you to apply for this exciting opportunity. Don't worry if you don't meet every single requirement - we value a great attitude and a willingness to learn above all. Apply Now

Apply Now

Apply Now →

Part-Time Benchmarking Engineer (Remote)

Similar Jobs

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

USPS Office Helper

[Remote/WFM] Jr. Software Developer (REMOTE OPTIONAL)

Entertainment Chat Operator

Account Executive

Senior Data Engineer (ETL & BI Focus) - Federal Government Project

[Remote] Reassessment LTD Claims Specialist

Experienced Full Stack Data Entry Clerk – Remote Part-Time Opportunity for Freshers at blithequark

Partner Sales Executive

Postdoctoral Researcher; Chemistry

Part-Time Assistant Product Manager - Flexible Remote Opportunity

Remote Senior Software Engineer - Core Product - Zetachain

Part-Time Benchmarking Engineer (Remote)

Similar Jobs

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

USPS Office Helper

[Remote/WFM] Jr. Software Developer (REMOTE OPTIONAL)

Entertainment Chat Operator

Account Executive

Senior Data Engineer (ETL & BI Focus) - Federal Government Project

[Remote] Reassessment LTD Claims Specialist

**Experienced Full Stack Data Entry Clerk – Remote Part-Time Opportunity for Freshers at blithequark**

Partner Sales Executive

Postdoctoral Researcher; Chemistry

Part-Time Assistant Product Manager - Flexible Remote Opportunity

Remote Senior Software Engineer - Core Product - Zetachain

Experienced Full Stack Data Entry Clerk – Remote Part-Time Opportunity for Freshers at blithequark