LLM Evaluation Engineer

Remote Full-time
Job Description: • Build the evaluation layer in the ThirdLaw platform for LLM prompts and responses • Design and tune guardrails, classifiers, and semantic judgment systems in real-time • Implement evaluation strategies with semantic similarity, foundation model scoring, and rule-based systems • Integrate model outputs with downstream enforcement actions (e.g. redaction, escalation, blocking) • Prototype, tune, and productize small language models for classification, labeling, or scoring • Collaborate with data infrastructure engineers to connect evaluation logic with ingestion and storage • Build tools to observe, debug, and improve evaluator performance across data distributions • Define abstractions for reusable evaluation components that can scale across use cases Requirements: • 7+ years of experience in ML systems or AI engineering roles • At least 1–2 years working directly with LLMs, NLP pipelines, or semantic search • Deep understanding of foundation models (e.g. OpenAI, Claude, Mistral, Llama) and APIs • Hands-on experience with vector search (e.g. FAISS, Qdrant, Weaviate) and embeddings pipelines • Proven ability to implement real-time or near-real-time evaluation logic using semantic similarity, classifier scoring, or structured rules • Strong in Python, with familiarity using libraries like Hugging Face Transformers, LangChain, and PyTorch or TensorFlow • Ability to reason about model behavior, test prompt configurations, and debug complex decision logic in production Benefits: • Generous benefits • Market cash compensation • Above-market equity • Well-designed benefits
Apply Now →

Similar Jobs

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote

USPS Office Helper

Remote

Business Development Manager – AI/Cloud Solutions on AWS (FinTech, Insurance, Healthcare, LifeSciences verticals)

Remote

The UPS Store Instructional Design Specialist - Curriculum Developer (Remote)

Remote

Medical Claims Business Analyst with UX Wireframing exp - 100% Remote

Remote

Caseworker - Bilingual

Remote

[Work From Home] Senior Multimedia Design and Communications

Remote

Financial Analyst

Remote

GIS Analyst / Survey Technician

Remote

Freight Broker Sales Agent (Work From Home)

Remote

Aetna Jobs Pharmacy Services $35/HourAetna Jobs Pharmacy Services $35/HourAetna Jobs Pharmacy Services $35/Hour

Remote

Experienced Customer Service Associate – Call Center – Remote, Full Time and Part Time Flex Positions Available in Philadelphia, PA

Remote
← Back