| Generative Image as Action Models | Jul 10, 2024 | Image GenerationRobot Manipulation | CodeCode Available | 2 |
| Streaming Diffusion Policy: Fast Policy Synthesis with Variable Noise Diffusion Models | Jun 7, 2024 | DenoisingImage Generation | CodeCode Available | 2 |
| Towards Generalizable Vision-Language Robotic Manipulation: A Benchmark and LLM-guided 3D Policy | Oct 2, 2024 | Motion PlanningRobot Manipulation | CodeCode Available | 2 |
| Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning | Dec 16, 2024 | HallucinationRobot Manipulation | CodeCode Available | 2 |
| What Matters in Learning from Offline Human Demonstrations for Robot Manipulation | Aug 6, 2021 | Imitation Learningreinforcement-learning | CodeCode Available | 2 |
| Equivariant Diffusion Policy | Jul 1, 2024 | Imitation LearningRobot Manipulation | CodeCode Available | 2 |
| RVT: Robotic View Transformer for 3D Object Manipulation | Jun 26, 2023 | ObjectRobot Manipulation | CodeCode Available | 2 |
| Robot Trajectron: Trajectory Prediction-based Shared Control for Robot Manipulation | Feb 4, 2024 | PositionRobot Manipulation | CodeCode Available | 2 |
| Autoregressive Action Sequence Learning for Robotic Manipulation | Oct 4, 2024 | ChunkingLanguage Modeling | CodeCode Available | 2 |
| RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control | Jul 28, 2023 | ObjectQuestion Answering | CodeCode Available | 2 |
| SE(3)-DiffusionFields: Learning smooth cost functions for joint grasp and motion optimization through diffusion | Sep 8, 2022 | Motion PlanningRobot Manipulation | CodeCode Available | 2 |
| DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution | Nov 4, 2024 | GPURobot Manipulation | CodeCode Available | 2 |
| AutoEval: Autonomous Evaluation of Generalist Robot Manipulation Policies in the Real World | Mar 31, 2025 | Robot ManipulationScheduling | CodeCode Available | 2 |
| Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation | Sep 12, 2022 | Robot ManipulationRobot Manipulation Generalization | CodeCode Available | 2 |
| Act3D: 3D Feature Field Transformers for Multi-Task Robotic Manipulation | Jun 30, 2023 | Action DetectionPose Prediction | CodeCode Available | 2 |
| R3M: A Universal Visual Representation for Robot Manipulation | Mar 23, 2022 | Contrastive LearningRobot Manipulation | CodeCode Available | 2 |
| RoboUniView: Visual-Language Model with Unified View Representation for Robotic Manipulation | Jun 27, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy | Mar 25, 2025 | DenoisingRobot Manipulation | CodeCode Available | 2 |
| Moto: Latent Motion Token as the Bridging Language for Learning Robot Manipulation from Videos | Dec 5, 2024 | Robot Manipulation | CodeCode Available | 2 |
| An Embodied Generalist Agent in 3D World | Nov 18, 2023 | 3D dense captioning3D Question Answering (3D-QA) | CodeCode Available | 2 |
| Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learning | Dec 17, 2024 | Denoising | CodeCode Available | 2 |
| CACTI: A Framework for Scalable Multi-Task Multi-Scene Visual Imitation Learning | Dec 12, 2022 | Data AugmentationImage Generation | CodeCode Available | 1 |
| ABNet: Attention BarrierNet for Safe and Scalable Robot Learning | Jun 18, 2024 | Autonomous DrivingRobot Manipulation | CodeCode Available | 1 |
| CALVIN: A Benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks | Dec 6, 2021 | Continuous ControlImitation Learning | CodeCode Available | 1 |
| Goal-Conditioned Imitation Learning using Score-based Diffusion Policies | Apr 5, 2023 | DenoisingImitation Learning | CodeCode Available | 1 |