LLM Evaluation Engineer

Remote Full-time
Job Description:
• Build the evaluation layer in the ThirdLaw platform for LLM prompts and responses
• Design and tune guardrails, classifiers, and semantic judgment systems in real-time
• Implement evaluation strategies with semantic similarity, foundation model scoring, and rule-based systems
• Integrate model outputs with downstream enforcement actions (e.g. redaction, escalation, blocking)
• Prototype, tune, and productize small language models for classification, labeling, or scoring
• Collaborate with data infrastructure engineers to connect evaluation logic with ingestion and storage
• Build tools to observe, debug, and improve evaluator performance across data distributions
• Define abstractions for reusable evaluation components that can scale across use cases

Requirements:
• 7+ years of experience in ML systems or AI engineering roles
• At least 1–2 years working directly with LLMs, NLP pipelines, or semantic search
• Deep understanding of foundation models (e.g. OpenAI, Claude, Mistral, Llama) and APIs
• Hands-on experience with vector search (e.g. FAISS, Qdrant, Weaviate) and embeddings pipelines
• Proven ability to implement real-time or near-real-time evaluation logic using semantic similarity, classifier scoring, or structured rules
• Strong in Python, with familiarity using libraries like Hugging Face Transformers, LangChain, and PyTorch or TensorFlow
• Ability to reason about model behavior, test prompt configurations, and debug complex decision logic in production

Benefits:
• Generous benefits
• Market cash compensation
• Above-market equity
• Well-designed benefits

Apply tot his job

Apply To this Job
Apply Now →

Similar Jobs

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote

USPS Office Helper

Remote

Experienced Financial Analyst and Data Entry Specialist for Part-Time Remote Opportunities – Financial Planning, Analysis, and Vendor Management

Remote

Jr Training Coordinator

Remote

Remote Client Service Representative

Remote

RN Tele Monitoring Observation

Remote

Product Manager: New Grad Accelerator

Remote

[Remote-Position] Online Data Entry Clerk ( Remote)

Remote

Urgently Hiring: User Product Testing Specialist

Remote

Principal AI Transformation Consultant

Remote

Call Center Representative - AI Feedback

Remote

FedEx CDL A Driver (Dedicated)

Remote
← Back