AI Model Evaluator (LLM & Agent Systems)

Remote Full-time

Job Title: AI Model Evaluator (LLM & Agent Systems)

Job Type: Contract (Minimum 2 weeks, with potential extension)

Location: Remote

Job Summary:

Join our customer's team as an AI Model Evaluator (LLM & Agent Systems) and play a pivotal role in shaping the future of generative AI and autonomous agents. You'll help benchmark, analyze, and assess cutting-edge AI systems in real-world scenarios, providing structured insights that drive improvements. This position is ideal for analytical professionals passionate about AI quality and real-world impact.

Key Responsibilities:
• Evaluate outputs from large language models (LLMs) and autonomous agent systems against defined guidelines and rubrics
• Review multi-step agent actions, including screenshots and reasoning traces, to determine accuracy and quality
• Consistently apply evaluation standards, flagging edge cases and identifying recurring patterns or failure modes
• Provide detailed, structured feedback to inform benchmarking, product evolution, and model refinement
• Participate in calibration and alignment sessions to ensure consistent application of evaluation criteria
• Work collaboratively to adapt to evolving scenarios and ambiguous evaluation situations
• Document findings and communicate insights clearly both in writing and verbally to relevant stakeholders

Required Skills and Qualifications:
• Demonstrated experience with LLM evaluation, AI output analysis, QA/testing, UX research, or similar analytical roles
• Strong background in AI model evaluation, benchmarking, and applying rubric-based scoring frameworks
• Exceptional attention to detail and sound judgement in ambiguous or edge-case scenarios
• Proficiency in English (B2+ or equivalent) with excellent written and verbal communication skills
• Ability to adapt quickly to evolving guidelines and work independently
• Comfort with remote work and a commitment of at least 20 hours per week for the initial term
• Analytical mindset with a focus on actionable, qualitative feedback

Preferred Qualifications:
• Experience with RLHF, annotation workflows, or AI benchmarking frameworks
• Familiarity with autonomous agent systems or workflow automation tools
• Background in mobile apps or digital product evaluation processes

Required Skills
• LLMs
• Generative AI
• AI Model Evaluation
• AI Benchmarking
• AI Quality Assessment
• Model Performance Evaluation
• Prompt Response Evaluation
• AI Output Analysis
• Rubric-Based Scoring

Apply tot his job

Apply To this Job

Apply Now →

AI Model Evaluator (LLM & Agent Systems)

Similar Jobs

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

USPS Office Helper

Experienced Customer Service Representative – Food Industry Expertise and Community Focus at arenaflex

Philanthropy Officer - Post Acute Care Services, Day Shift, Rehab Philanthropy in Rockville, MD

Legal Analyst, Climate and Environment

SAP Data Migration Senior Consultant in Rosslyn, VA

Account Executive

Flex Nurse Educator - Memphis, TN Memphis, TN

Experienced Customer Service Representative – Federal Student Loan Servicing

Urgent Job Opening - VCCS - Data Analyst 4 - Daleville, Virginia 24083 - Remote

Remote- Patient Access Representative- Centrali...

Growth Insights Analyst Santa Monica, CA, USA

AI Model Evaluator (LLM & Agent Systems)

Similar Jobs

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

USPS Office Helper

Experienced Customer Service Representative – Food Industry Expertise and Community Focus at arenaflex

Philanthropy Officer - Post Acute Care Services, Day Shift, Rehab Philanthropy in Rockville, MD

Legal Analyst, Climate and Environment

SAP Data Migration Senior Consultant in Rosslyn, VA

Account Executive

Flex Nurse Educator - Memphis, TN Memphis, TN

**Experienced Customer Service Representative – Federal Student Loan Servicing**

Urgent Job Opening - VCCS - Data Analyst 4 - Daleville, Virginia 24083 - Remote

Remote- Patient Access Representative- Centrali...

Growth Insights Analyst Santa Monica, CA, USA

Experienced Customer Service Representative – Federal Student Loan Servicing