LLM Evaluation Engineer

Remote Full-time
Job Description:
• Build the evaluation layer in the ThirdLaw platform for LLM prompts and responses
• Design and tune guardrails, classifiers, and semantic judgment systems in real-time
• Implement evaluation strategies with semantic similarity, foundation model scoring, and rule-based systems
• Integrate model outputs with downstream enforcement actions (e.g. redaction, escalation, blocking)
• Prototype, tune, and productize small language models for classification, labeling, or scoring
• Collaborate with data infrastructure engineers to connect evaluation logic with ingestion and storage
• Build tools to observe, debug, and improve evaluator performance across data distributions
• Define abstractions for reusable evaluation components that can scale across use cases

Requirements:
• 7+ years of experience in ML systems or AI engineering roles
• At least 1–2 years working directly with LLMs, NLP pipelines, or semantic search
• Deep understanding of foundation models (e.g. OpenAI, Claude, Mistral, Llama) and APIs
• Hands-on experience with vector search (e.g. FAISS, Qdrant, Weaviate) and embeddings pipelines
• Proven ability to implement real-time or near-real-time evaluation logic using semantic similarity, classifier scoring, or structured rules
• Strong in Python, with familiarity using libraries like Hugging Face Transformers, LangChain, and PyTorch or TensorFlow
• Ability to reason about model behavior, test prompt configurations, and debug complex decision logic in production

Benefits:
• Generous benefits
• Market cash compensation
• Above-market equity
• Well-designed benefits

Apply tot his job

Apply To this Job
Apply Now →

Similar Jobs

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote

USPS Office Helper

Remote

Director, Customer Self-Service & Automation

Remote

Join Today: USPS Postal Associate

Remote

**Experienced Live Chat Customer Support Specialist – Remote Opportunity at arenaflex**

Remote

Account Director - eDiscovery Sales

Remote

Payment Processor

Remote

Network Architect- Cisco, Meraki, AWS, WiFi

Remote

**Experienced Full Stack Data Entry Specialist – Remote Work Opportunity at blithequark**

Remote

**Experienced Live Chat Support Agent – Entry-Level Opportunity for Remote Customer Service Professionals**

Remote

**Experienced Remote Chat Support Agent – Global Customer Service Representative**

Remote

No Experience-Apple Remote Jobs

Remote
← Back