LLM Evaluation Engineer

Remote Full-time
Job Description: • Build the evaluation layer in the ThirdLaw platform for LLM prompts and responses • Design and tune guardrails, classifiers, and semantic judgment systems in real-time • Implement evaluation strategies with semantic similarity, foundation model scoring, and rule-based systems • Integrate model outputs with downstream enforcement actions (e.g. redaction, escalation, blocking) • Prototype, tune, and productize small language models for classification, labeling, or scoring • Collaborate with data infrastructure engineers to connect evaluation logic with ingestion and storage • Build tools to observe, debug, and improve evaluator performance across data distributions • Define abstractions for reusable evaluation components that can scale across use cases Requirements: • 7+ years of experience in ML systems or AI engineering roles • At least 1–2 years working directly with LLMs, NLP pipelines, or semantic search • Deep understanding of foundation models (e.g. OpenAI, Claude, Mistral, Llama) and APIs • Hands-on experience with vector search (e.g. FAISS, Qdrant, Weaviate) and embeddings pipelines • Proven ability to implement real-time or near-real-time evaluation logic using semantic similarity, classifier scoring, or structured rules • Strong in Python, with familiarity using libraries like Hugging Face Transformers, LangChain, and PyTorch or TensorFlow • Ability to reason about model behavior, test prompt configurations, and debug complex decision logic in production Benefits: • Generous benefits • Market cash compensation • Above-market equity • Well-designed benefits
Apply Now →

Similar Jobs

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote

USPS Office Helper

Remote

SuccessFactors Recruiting: IT Governance Program Manager (IT@JH Research IT Solutions) (120771)

Remote

**Experienced Instagram Chat Support Specialist – Entry-Level Part-Time Opportunity at arenaflex**

Remote

Influencer Outreach Coordinator

Remote

Sales Development Representative

Remote

Cloud Operations Engineer I (Second Shift)

Remote

Instructional Design and Training Consultant job at Henry Ford Health System in Detroit, MI

Remote

**Experienced Customer Experience Manager – Home Depot Customer Support Remote Jobs**

Remote

**Experienced Part-Time Remote Amazon Data Entry Specialist – E-commerce Operations and Product Management**

Remote

Legal Counsel

Remote

Professional Research Assistant

Remote
← Back