| DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution | Nov 4, 2024 | GPURobot Manipulation | CodeCode Available | 2 |
| Autoregressive Action Sequence Learning for Robotic Manipulation | Oct 4, 2024 | ChunkingLanguage Modeling | CodeCode Available | 2 |
| Towards Generalizable Vision-Language Robotic Manipulation: A Benchmark and LLM-guided 3D Policy | Oct 2, 2024 | Motion PlanningRobot Manipulation | CodeCode Available | 2 |
| GR-MG: Leveraging Partially Annotated Data via Multi-Modal Goal-Conditioned Policy | Aug 26, 2024 | Few-Shot LearningImage Generation | CodeCode Available | 2 |
| Generative Image as Action Models | Jul 10, 2024 | Image GenerationRobot Manipulation | CodeCode Available | 2 |
| Equivariant Diffusion Policy | Jul 1, 2024 | Imitation LearningRobot Manipulation | CodeCode Available | 2 |
| RoboUniView: Visual-Language Model with Unified View Representation for Robotic Manipulation | Jun 27, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Streaming Diffusion Policy: Fast Policy Synthesis with Variable Noise Diffusion Models | Jun 7, 2024 | DenoisingImage Generation | CodeCode Available | 2 |
| Robot Trajectron: Trajectory Prediction-based Shared Control for Robot Manipulation | Feb 4, 2024 | PositionRobot Manipulation | CodeCode Available | 2 |
| Unleashing Large-Scale Video Generative Pre-training for Visual Robot Manipulation | Dec 20, 2023 | Robot ManipulationZero-shot Generalization | CodeCode Available | 2 |
| An Embodied Generalist Agent in 3D World | Nov 18, 2023 | 3D dense captioning3D Question Answering (3D-QA) | CodeCode Available | 2 |
| RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control | Jul 28, 2023 | ObjectQuestion Answering | CodeCode Available | 2 |
| VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models | Jul 12, 2023 | FormLanguage Modelling | CodeCode Available | 2 |
| Act3D: 3D Feature Field Transformers for Multi-Task Robotic Manipulation | Jun 30, 2023 | Action DetectionPose Prediction | CodeCode Available | 2 |
| RVT: Robotic View Transformer for 3D Object Manipulation | Jun 26, 2023 | ObjectRobot Manipulation | CodeCode Available | 2 |
| FurnitureBench: Reproducible Real-World Benchmark for Long-Horizon Complex Manipulation | May 22, 2023 | Imitation LearningMotion Planning | CodeCode Available | 2 |
| VIMA: General Robot Manipulation with Multimodal Prompts | Oct 6, 2022 | Imitation LearningLanguage Modelling | CodeCode Available | 2 |
| Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation | Sep 12, 2022 | Robot ManipulationRobot Manipulation Generalization | CodeCode Available | 2 |
| SE(3)-DiffusionFields: Learning smooth cost functions for joint grasp and motion optimization through diffusion | Sep 8, 2022 | Motion PlanningRobot Manipulation | CodeCode Available | 2 |
| R3M: A Universal Visual Representation for Robot Manipulation | Mar 23, 2022 | Contrastive LearningRobot Manipulation | CodeCode Available | 2 |
| What Matters in Learning from Offline Human Demonstrations for Robot Manipulation | Aug 6, 2021 | Imitation Learningreinforcement-learning | CodeCode Available | 2 |
| 3DFlowAction: Learning Cross-Embodiment Manipulation from 3D Flow World Model | Jun 6, 2025 | Optical Flow EstimationRobot Manipulation | CodeCode Available | 1 |
| Mini Diffuser: Fast Multi-task Diffusion Policy Training Using Two-level Mini-batches | May 14, 2025 | Action GenerationImage Generation | CodeCode Available | 1 |
| From Seeing to Doing: Bridging Reasoning and Decision for Robotic Manipulation | May 13, 2025 | Robot ManipulationSpatial Reasoning | CodeCode Available | 1 |
| Multimodal Fusion and Vision-Language Models: A Survey for Robot Vision | Apr 3, 2025 | 3D Object Detectioncross-modal alignment | CodeCode Available | 1 |