| OpenVLA: An Open-Source Vision-Language-Action Model | Jun 13, 2024 | Imitation LearningLanguage Modelling | CodeCode Available | 9 | 5 |
| On the Vulnerability of LLM/VLM-Controlled Robotics | Feb 15, 2024 | Language ModellingRobot Manipulation | CodeCode Available | 7 | 5 |
| Evaluating Real-World Robot Manipulation Policies in Simulation | May 9, 2024 | Robotic GraspingRobot Manipulation | CodeCode Available | 5 | 5 |
| Magma: A Foundation Model for Multimodal AI Agents | Feb 18, 2025 | Autonomous Web NavigationImage to text | CodeCode Available | 5 | 5 |
| UniVLA: Learning to Act Anywhere with Task-centric Latent Actions | May 9, 2025 | Robot ManipulationVision-Language-Action | CodeCode Available | 5 | 5 |
| 3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D Representations | Mar 6, 2024 | Imitation LearningRobot Manipulation | CodeCode Available | 5 | 5 |
| Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations | Dec 19, 2024 | Contrastive LearningImage Reconstruction | CodeCode Available | 3 | 5 |
| DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge | Jul 6, 2025 | Image GenerationMultimodal Reasoning | CodeCode Available | 3 | 5 |
| OpenHelix: A Short Survey, Empirical Analysis, and Open-Source Dual-System VLA Model for Robotic Manipulation | May 6, 2025 | Robot ManipulationVision-Language-Action | CodeCode Available | 3 | 5 |
| LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot Learning | Jun 5, 2023 | Benchmarking | CodeCode Available | 3 | 5 |
| Affordance-based Robot Manipulation with Flow Matching | Sep 2, 2024 | Action GenerationRobot Manipulation | CodeCode Available | 3 | 5 |
| 3D Diffuser Actor: Policy Diffusion with 3D Scene Representations | Feb 16, 2024 | DenoisingRobot Manipulation | CodeCode Available | 3 | 5 |
| PhysTwin: Physics-Informed Reconstruction and Simulation of Deformable Objects from Videos | Mar 23, 2025 | 4D reconstructionDeformable Object Manipulation | CodeCode Available | 3 | 5 |
| 3D Diffuser Actor: Policy Diffusion with 3D Scene Representations | Feb 18, 2024 | DenoisingRobot Manipulation | CodeCode Available | 3 | 5 |
| SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation | Feb 18, 2025 | Object RearrangementRobot Manipulation | CodeCode Available | 3 | 5 |
| RT-1: Robotics Transformer for Real-World Control at Scale | Dec 13, 2022 | DiversityRobot Manipulation | CodeCode Available | 3 | 5 |
| RLVR-World: Training World Models with Reinforcement Learning | May 20, 2025 | reinforcement-learningReinforcement Learning | CodeCode Available | 3 | 5 |
| Towards Generalist Robot Policies: What Matters in Building Vision-Language-Action Models | Dec 18, 2024 | Representation LearningRobot Manipulation | CodeCode Available | 3 | 5 |
| Latent Action Pretraining from Videos | Oct 15, 2024 | QuantizationRobot Manipulation | CodeCode Available | 3 | 5 |
| RVT-2: Learning Precise Manipulation from Few Demonstrations | Jun 12, 2024 | Robot ManipulationRobot Manipulation Generalization | CodeCode Available | 3 | 5 |
| GR-MG: Leveraging Partially Annotated Data via Multi-Modal Goal-Conditioned Policy | Aug 26, 2024 | Few-Shot LearningImage Generation | CodeCode Available | 2 | 5 |
| Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning | Dec 16, 2024 | HallucinationRobot Manipulation | CodeCode Available | 2 | 5 |
| What Matters in Learning from Offline Human Demonstrations for Robot Manipulation | Aug 6, 2021 | Imitation Learningreinforcement-learning | CodeCode Available | 2 | 5 |
| Equivariant Diffusion Policy | Jul 1, 2024 | Imitation LearningRobot Manipulation | CodeCode Available | 2 | 5 |
| Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy | Mar 25, 2025 | DenoisingRobot Manipulation | CodeCode Available | 2 | 5 |
| Generative Image as Action Models | Jul 10, 2024 | Image GenerationRobot Manipulation | CodeCode Available | 2 | 5 |
| Unleashing Large-Scale Video Generative Pre-training for Visual Robot Manipulation | Dec 20, 2023 | Robot ManipulationZero-shot Generalization | CodeCode Available | 2 | 5 |
| VIMA: General Robot Manipulation with Multimodal Prompts | Oct 6, 2022 | Imitation LearningLanguage Modelling | CodeCode Available | 2 | 5 |
| VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models | Jul 12, 2023 | FormLanguage Modelling | CodeCode Available | 2 | 5 |
| Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learning | Dec 17, 2024 | Denoising | CodeCode Available | 2 | 5 |
| Streaming Diffusion Policy: Fast Policy Synthesis with Variable Noise Diffusion Models | Jun 7, 2024 | DenoisingImage Generation | CodeCode Available | 2 | 5 |
| DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution | Nov 4, 2024 | GPURobot Manipulation | CodeCode Available | 2 | 5 |
| RVT: Robotic View Transformer for 3D Object Manipulation | Jun 26, 2023 | ObjectRobot Manipulation | CodeCode Available | 2 | 5 |
| RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control | Jul 28, 2023 | ObjectQuestion Answering | CodeCode Available | 2 | 5 |
| Autoregressive Action Sequence Learning for Robotic Manipulation | Oct 4, 2024 | ChunkingLanguage Modeling | CodeCode Available | 2 | 5 |
| Robot Trajectron: Trajectory Prediction-based Shared Control for Robot Manipulation | Feb 4, 2024 | PositionRobot Manipulation | CodeCode Available | 2 | 5 |
| Towards Generalizable Vision-Language Robotic Manipulation: A Benchmark and LLM-guided 3D Policy | Oct 2, 2024 | Motion PlanningRobot Manipulation | CodeCode Available | 2 | 5 |
| AutoEval: Autonomous Evaluation of Generalist Robot Manipulation Policies in the Real World | Mar 31, 2025 | Robot ManipulationScheduling | CodeCode Available | 2 | 5 |
| Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation | Sep 12, 2022 | Robot ManipulationRobot Manipulation Generalization | CodeCode Available | 2 | 5 |
| Act3D: 3D Feature Field Transformers for Multi-Task Robotic Manipulation | Jun 30, 2023 | Action DetectionPose Prediction | CodeCode Available | 2 | 5 |
| RoboUniView: Visual-Language Model with Unified View Representation for Robotic Manipulation | Jun 27, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| R3M: A Universal Visual Representation for Robot Manipulation | Mar 23, 2022 | Contrastive LearningRobot Manipulation | CodeCode Available | 2 | 5 |
| Moto: Latent Motion Token as the Bridging Language for Learning Robot Manipulation from Videos | Dec 5, 2024 | Robot Manipulation | CodeCode Available | 2 | 5 |
| FurnitureBench: Reproducible Real-World Benchmark for Long-Horizon Complex Manipulation | May 22, 2023 | Imitation LearningMotion Planning | CodeCode Available | 2 | 5 |
| SE(3)-DiffusionFields: Learning smooth cost functions for joint grasp and motion optimization through diffusion | Sep 8, 2022 | Motion PlanningRobot Manipulation | CodeCode Available | 2 | 5 |
| An Embodied Generalist Agent in 3D World | Nov 18, 2023 | 3D dense captioning3D Question Answering (3D-QA) | CodeCode Available | 2 | 5 |
| ABNet: Attention BarrierNet for Safe and Scalable Robot Learning | Jun 18, 2024 | Autonomous DrivingRobot Manipulation | CodeCode Available | 1 | 5 |
| DrS: Learning Reusable Dense Rewards for Multi-Stage Tasks | Apr 25, 2024 | Robot Manipulation | CodeCode Available | 1 | 5 |
| GUARD: A Safe Reinforcement Learning Benchmark | May 23, 2023 | Autonomous DrivingDiversity | CodeCode Available | 1 | 5 |
| BundleTrack: 6D Pose Tracking for Novel Objects without Instance or Category-Level 3D Models | Aug 1, 2021 | 3D Object Tracking6D Pose Estimation | CodeCode Available | 1 | 5 |