Robot Learning Engineer
Wave Recruitment
This robot learning role is with a seriously exciting scale up. The platform is mature, the data is flowing, and the team is ready to scale its most promising research directions into production-grade manipulation policies.
They need someone to lead the development and deployment of large behaviour models, taking diffusion transformers, VLAs, and language-conditioned policies from the literature onto a real bi-manual humanoid.
This is not a research-only role. You’ll inherit a mature policy training codebase, a VR teleoperation pipeline producing high-frequency multi-modal data, and a Gymnasium environment wrapping a real robot. The work you ship runs on hardware.
The Role
You will architect, train, and deploy end-to-end large behaviour models for bi-manual and mobile manipulation, and lead the maturing of the early-stage RL pipeline.
The key responsibilities
Architect, train, and evaluate end-to-end large behaviour models for bi-manual and mobile manipulation
Advance diffusion transformer policies, mature VLA integration, and develop language conditioning for true multi-task generalisation
Apply RL to refine pre-trained policies: RL token fine-tuning, residual RL, off-policy RL with reference-action regularisation, RL-based fine-tuning of diffusion policies
Build a systematic sim-to-real transfer pipeline, connecting existing simulation infrastructure to training
Deploy and iterate learned policies on physical robot hardware
Mentor junior researchers and engineers, and publish at top-tier venues
What We’re Looking For
Essential:
PhD/MSc in ML, Robotics, CS, or related field with 4+ years of equivalent industry research experience
Demonstrated expertise training and deploying learned manipulation policies on real robots
Strong background in at least two of: behaviour cloning, diffusion policies, VLA/VLM architectures, RL for manipulation
PyTorch and large-scale (multi-GPU, distributed) training
Track record of publications at top-tier venues (CoRL, RSS, ICRA, NeurIPS, ICML, ICLR), or equivalent demonstrated research impact through deployed systems, patents, or significant open-source contributions
Strong Python; production-quality research code with proper testing, type hints, and documentation
Useful:
Hands-on experience with humanoid or bi-manual manipulation platforms
Diffusion transformer, ACT, or VLA architectures specifically
Pre-trained vision/language models for robot control (CLIP, DINOv2, PaliGemma)
MuJoCo, Isaac Sim, or ManiSkill for sim-to-real policy training
RL fine-tuning of pre-trained policies (residual RL, DPPO, or similar)
3D perception for policy conditioning (point clouds, keypoints, NeRFs)
Key contribution areas
Policy Architecture & Training
End-to-end large behaviour models for bi-manual and mobile manipulation
Scale and evolve diffusion transformer policies, VLA integration, and language conditioning
Extend the imitation learning pipeline to leverage growing teleoperation datasets
Apply RL to push beyond what imitation alone can reach
Target sub-millimetre precision and contact-rich manipulation
Generalisation & Scaling
Develop policies that generalise across tasks, object categories, and environments
Move from single-task to multi-task and task-conditioned architectures
Design hierarchical behaviour systems for long-horizon manipulation
Investigate data-efficient learning: few-shot adaptation, transfer learning, multi-dataset training
Drive systematic ablations across architectures
Sim-to-Real & Deployment
Build the sim-to-real transfer pipeline: domain randomisation, rendering augmentation, sim-to-real benchmarking
Deploy and iterate learned policies on physical robot hardware
Extend the Gymnasium environment wrapper and integrate with the robot’s control stack
Leverage perception team outputs (keypoints, learned features, 3D point clouds) for policy conditioning
Research Leadership
Track the literature and bring relevant advances back to the team
Identify and propose new research directions aligned with the manipulation roadmap
Mentor junior researchers and engineers
Publish at top-tier venues — conference attendance and open-source contributions are actively supported
What’s On Offer
Join a team with world class applied research scientists, ML engineers, and robotics software engineers
A mature platform that ships to physical hardware, not slides
Active support for conference attendance and open-source contributions
Competitive compensation
Apply or send your CV to — Imogen@waverecruitment.co.uk