AI Research Scientist (United States, Remote)
AI Research Jobs in the United States (Remote, Full-Time)
You will run applied AI research projects for US-based customers via Rex.zone, translating open-ended research questions into measurable experiments across LLM evaluation, RLHF data design, prompt evaluation, and model performance improvement.
What You Will Do
⢠Own end-to-end applied research cycles: problem framing, baselines, ablations, and reporting
⢠Build and evaluate LLM systems using offline metrics and human-in-the-loop evaluation
⢠Design RLHF workflows: preference data specs, rater instructions, prompt sets, and rubric-based grading
⢠Create evaluation datasets and test suites: prompt evaluation, red-teaming prompts, and content safety labeling protocols
⢠Collaborate with data labeling teams on taxonomy, edge-case coverage, and training data quality
⢠Perform error analysis and model debugging to improve robustness, safety, and helpfulness
⢠Document methodology and results for reproducibility and auditability
Required Qualifications
⢠Mid-Senior experience delivering applied ML research or productionized ML evaluation
⢠Strong Python skills; experience with PyTorch (or similar)
⢠Hands-on LLM evaluation, prompt evaluation, or RLHF experience
⢠Experiment design, metrics selection, and statistically sound interpretation
⢠Familiarity with dataset development: data labeling, QA evaluation, and guideline compliance checks
⢠Strong written communication for research artifacts and cross-functional alignment
Preferred Qualifications
⢠RAG/NER/structured output evaluation experience
⢠Exposure to computer vision or multimodal evaluation
⢠Content safety labeling taxonomies and policy-aligned rubrics
⢠MLOps for evaluation pipelines, dataset versioning, and reproducible runs
Remote Work and Collaboration
Remote, FULL_TIME role supporting United States-based projects with distributed teams across research, engineering, and data operations.
Compensation
Hourly base pay range: $30ā$50/hr.
Apply tot his job
Apply To this Job
You will run applied AI research projects for US-based customers via Rex.zone, translating open-ended research questions into measurable experiments across LLM evaluation, RLHF data design, prompt evaluation, and model performance improvement.
What You Will Do
⢠Own end-to-end applied research cycles: problem framing, baselines, ablations, and reporting
⢠Build and evaluate LLM systems using offline metrics and human-in-the-loop evaluation
⢠Design RLHF workflows: preference data specs, rater instructions, prompt sets, and rubric-based grading
⢠Create evaluation datasets and test suites: prompt evaluation, red-teaming prompts, and content safety labeling protocols
⢠Collaborate with data labeling teams on taxonomy, edge-case coverage, and training data quality
⢠Perform error analysis and model debugging to improve robustness, safety, and helpfulness
⢠Document methodology and results for reproducibility and auditability
Required Qualifications
⢠Mid-Senior experience delivering applied ML research or productionized ML evaluation
⢠Strong Python skills; experience with PyTorch (or similar)
⢠Hands-on LLM evaluation, prompt evaluation, or RLHF experience
⢠Experiment design, metrics selection, and statistically sound interpretation
⢠Familiarity with dataset development: data labeling, QA evaluation, and guideline compliance checks
⢠Strong written communication for research artifacts and cross-functional alignment
Preferred Qualifications
⢠RAG/NER/structured output evaluation experience
⢠Exposure to computer vision or multimodal evaluation
⢠Content safety labeling taxonomies and policy-aligned rubrics
⢠MLOps for evaluation pipelines, dataset versioning, and reproducible runs
Remote Work and Collaboration
Remote, FULL_TIME role supporting United States-based projects with distributed teams across research, engineering, and data operations.
Compensation
Hourly base pay range: $30ā$50/hr.
Apply tot his job
Apply To this Job