Senior Engineer - AI Evaluator

Remote Full-time
Senior AI Interaction Evaluator (Codex / Claude Code)

Contract | $100–$200/hour | 10–20 hrs/week | Start ASAP (through early May)

Check out this Loom video for more details!

We’re looking for highly experienced software engineer (SR+) to help evaluate the quality of interactions with modern coding agents such as OpenAI Codex and Claude Code.

This is not a traditional engineering role.
You won’t be writing production code.

You’ll be evaluating something harder: whether the model thinks like a great engineer.
What This Role Actually Is

You will assess how AI coding agents behave in real-world scenarios — focusing on:
• Whether the response makes sense
• Whether the preamble and reasoning are useful
• Whether the output reflects strong engineering judgment
• Whether the interaction feels right to an experienced developer

This role is about engineering taste — not syntax correctness.
What You’ll Be Doing
• Evaluate AI-generated coding interactions end-to-end
• Judge whether outputs are:
• Useful
• Correct (at a high level)
• Aligned with how a strong engineer would think
• Assess the quality of explanations and reasoning, not just code
• Distinguish between different levels of response quality (e.g. what makes something a 2 vs 4)
• Provide clear, opinionated feedback on:
• What worked
• What didn’t
• What felt “off” or misleading
• Help define what great looks like when interacting with tools like Cursor

What We Mean by “Taste”

We’re specifically looking for engineers who can answer questions like:
• Does this feel like something a strong engineer would actually say?
• Is this explanation helpful, or just technically correct?
• Is the model guiding the user well, or just dumping output?
• Would this interaction build or erode trust?

You should be comfortable making subjective but rigorous judgments.
Who You Are
• Staff / Principal-level engineer (or equivalent experience)
• Strong background in one of the below:
• TypeScript / JavaScript
• Python
• Hands-on experience using:
• OpenAI Codex
• Claude Code
• Cursor
• Deep familiarity with modern AI-assisted dev workflows
• Able to evaluate code without needing to fully execute or deeply review every line
• Comfortable giving direct, opinionated feedback
• High bar for what “good engineering” looks like

Nice to Have
• Experience with tools like Cursor or similar AI-first IDEs
• Prior exposure to prompt design or evaluation workflows
• Experience mentoring senior engineers or defining engineering standards

Engagement Details
• Rate: $100–$200/hour
• Hours: ~10–20 hours/week
• Duration: Through early May (with possible extension)
• Start: ASAP
• Process:
• Take-home evaluation exercise
• One behavioral interview

Apply tot his job

Apply To this Job
Apply Now →

Similar Jobs

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote

USPS Office Helper

Remote

Senior Sales Strategy Analyst

Remote

**Experienced Data Entry Specialist – Flexible Work Arrangement at blithequark**

Remote

Salesforce Quality Assurance (QA) Tester Remote / Telecommute Jobs

Remote

Utilization Review Registered Nurse

Remote

Procurement Forester / Log Buyer

Remote

Client Success Associate

Remote

**Experienced Customer Care Manager – Renewable Energy Community Development**

Remote

Immediately Require Online English Tutor – Flexible Hours in Flint, MI

Remote

Procurement Partners (Anticipated Vacancy) (SY25-26)

Remote

Experienced Full Stack Customer Service Representative – Live Chat Support Agent (Entry-Level) at careerzynith

Remote
← Back