Research Scientist
Active Inference Benchmarking Researcher Description Overview Contribute to the design, implementation, and evaluation of benchmarking frameworks for uncertainty-aware autonomy—specifically active inference—within a teleoperation-augmented robotics platform. This role focuses on quantifying how probabilistic decision-making improves human-in-the-loop scalability, safety under uncertainty, and autonomous productivity across real-world robotic systems. Key Responsibilities 1. Active Inference Benchmark Design & Execution Co-design and implement benchmarking protocols comparing active inference agents to: Conventional reinforcement learning (RL) baselines RL systems augmented with uncertainty estimation Evaluate performance across: Data efficiency Safety under distribution shift Directed exploration Sim-to-real robustness Teleoperation scaling efficiency Explainability 2. Teleoperation-Aware Evaluation Framework Integrate benchmarking into a standardized teleoperation control protocol where agents decide when to: Continue autonomous execution Request human takeover under a constrained intervention budget Develop metrics capturing: Human scalability (operator-to-robot ratio, intervention allocation efficiency) Safety under uncertainty (timeliness and selectivity of handovers) Autonomous work efficiency (task completion under limited supervision) 3. Platform Integration (Teleoperation Stack) Align benchmarking workloads with the broader teleoperation platform architecture: On-robot control and safety systems Near-edge inference (uncertainty estimation, planning, intervention logic) Cloud-based training, analytics, and fleet management Ensure benchmarks reflect real system constraints: Latency budgets Network degradation and connectivity loss Multi-robot resource sharing 4. Embodiment Ladder Evaluation Execute experiments across a staged pipeline: Tier 1: Controlled simulation (e.g., MuJoCo environments) Tier 2: High-fidelity robotic simulation (e.g., RLBench, ManiSkill) Tier 3: Real-world or dataset-driven validation Maintain consistency via a shared teleoperation surrogate (expert policy / planner) to emulate human intervention 5. Uncertainty & Intervention Analysis Quantify and analyze: Calibration of uncertainty signals Intervention precision/recall Learning from intervention (post-handover improvement) Stability across repeated autonomy–human control cycles Compare whether: Native probabilistic approaches (active inference) Retrofitted uncertainty (ensembles, Bayesian heads, etc.) Heuristic baselines best optimize teleoperation efficiency 6. Systems & Scaling Insights Profile compute and system behavior of active inference workloads within the teleoperation stack: World model rollouts Posterior inference Intervention decision logic Contribute to: Near-edge workload allocation strategies Fleet scaling models (robots per server) Latency vs. safety tradeoffs 7. Deliverables Reproducible benchmarking suite and datasets Technical reports and whitepapers Conference publications (robotics / ML / systems venues) Design recommendations for teleoperation and autonomy stacks Cross-team guidance for infrastructure, controls, and ML teams Success Criteria Demonstrated improvement in intervention efficiency vs. safety tradeoff Measurable gains in operator scaling (robots per human) Robust performance under distribution shift and real-world noise Clear evidence of when and why uncertainty-aware methods outperform baselines About the Company Noumenal Labs is a deep tech AI company closing performance gaps in outdoor robotics. Our uncertainty-aware systems learn and adapt in real time, positioning Noumenal as a core software layer for next-generation robotic hardware operating in uncharted domains.