| OpenVLA: An Open-Source Vision-Language-Action Model | Jun 13, 2024 | Imitation LearningLanguage Modelling | CodeCode Available | 9 |
| On the Vulnerability of LLM/VLM-Controlled Robotics | Feb 15, 2024 | Language ModellingRobot Manipulation | CodeCode Available | 7 |
| UniVLA: Learning to Act Anywhere with Task-centric Latent Actions | May 9, 2025 | Robot ManipulationVision-Language-Action | CodeCode Available | 5 |
| 3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D Representations | Mar 6, 2024 | Imitation LearningRobot Manipulation | CodeCode Available | 5 |
| Evaluating Real-World Robot Manipulation Policies in Simulation | May 9, 2024 | Robotic GraspingRobot Manipulation | CodeCode Available | 5 |
| Magma: A Foundation Model for Multimodal AI Agents | Feb 18, 2025 | Autonomous Web NavigationImage to text | CodeCode Available | 5 |
| OpenHelix: A Short Survey, Empirical Analysis, and Open-Source Dual-System VLA Model for Robotic Manipulation | May 6, 2025 | Robot ManipulationVision-Language-Action | CodeCode Available | 3 |
| Latent Action Pretraining from Videos | Oct 15, 2024 | QuantizationRobot Manipulation | CodeCode Available | 3 |
| Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations | Dec 19, 2024 | Contrastive LearningImage Reconstruction | CodeCode Available | 3 |
| Affordance-based Robot Manipulation with Flow Matching | Sep 2, 2024 | Action GenerationRobot Manipulation | CodeCode Available | 3 |
| 3D Diffuser Actor: Policy Diffusion with 3D Scene Representations | Feb 18, 2024 | DenoisingRobot Manipulation | CodeCode Available | 3 |
| LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot Learning | Jun 5, 2023 | Benchmarking | CodeCode Available | 3 |
| 3D Diffuser Actor: Policy Diffusion with 3D Scene Representations | Feb 16, 2024 | DenoisingRobot Manipulation | CodeCode Available | 3 |
| Towards Generalist Robot Policies: What Matters in Building Vision-Language-Action Models | Dec 18, 2024 | Representation LearningRobot Manipulation | CodeCode Available | 3 |
| SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation | Feb 18, 2025 | Object RearrangementRobot Manipulation | CodeCode Available | 3 |
| RLVR-World: Training World Models with Reinforcement Learning | May 20, 2025 | reinforcement-learningReinforcement Learning | CodeCode Available | 3 |
| PhysTwin: Physics-Informed Reconstruction and Simulation of Deformable Objects from Videos | Mar 23, 2025 | 4D reconstructionDeformable Object Manipulation | CodeCode Available | 3 |
| RT-1: Robotics Transformer for Real-World Control at Scale | Dec 13, 2022 | DiversityRobot Manipulation | CodeCode Available | 3 |
| DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge | Jul 6, 2025 | Image GenerationMultimodal Reasoning | CodeCode Available | 3 |
| RVT-2: Learning Precise Manipulation from Few Demonstrations | Jun 12, 2024 | Robot ManipulationRobot Manipulation Generalization | CodeCode Available | 3 |
| Moto: Latent Motion Token as the Bridging Language for Learning Robot Manipulation from Videos | Dec 5, 2024 | Robot Manipulation | CodeCode Available | 2 |
| Equivariant Diffusion Policy | Jul 1, 2024 | Imitation LearningRobot Manipulation | CodeCode Available | 2 |
| FurnitureBench: Reproducible Real-World Benchmark for Long-Horizon Complex Manipulation | May 22, 2023 | Imitation LearningMotion Planning | CodeCode Available | 2 |
| Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learning | Dec 17, 2024 | Denoising | CodeCode Available | 2 |
| Act3D: 3D Feature Field Transformers for Multi-Task Robotic Manipulation | Jun 30, 2023 | Action DetectionPose Prediction | CodeCode Available | 2 |