| DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution | Nov 4, 2024 | GPURobot Manipulation | CodeCode Available | 2 |
| π_0: A Vision-Language-Action Flow Model for General Robot Control | Oct 31, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Diffusion Transformer Policy | Oct 21, 2024 | DenoisingVision-Language-Action | CodeCode Available | 2 |
| A Dual Process VLA: Efficient Robotic Manipulation Leveraging VLM | Oct 21, 2024 | Decision MakingVision-Language-Action | —Unverified | 0 |
| Vision-Language-Action Model and Diffusion Policy Switching Enables Dexterous Control of an Anthropomorphic Hand | Oct 17, 2024 | Vision-Language-Action | —Unverified | 0 |
| Latent Action Pretraining from Videos | Oct 15, 2024 | QuantizationRobot Manipulation | CodeCode Available | 3 |
| Towards Synergistic, Generalized, and Efficient Dual-System for Robotic Manipulation | Oct 10, 2024 | Robot ManipulationVision-Language-Action | —Unverified | 0 |
| LADEV: A Language-Driven Testing and Evaluation Platform for Vision-Language-Action Models in Robotic Manipulation | Oct 7, 2024 | Vision-Language-Action | —Unverified | 0 |
| Run-time Observation Interventions Make Vision-Language-Action Models More Visually Robust | Oct 2, 2024 | Vision-Language-Action | —Unverified | 0 |
| ReVLA: Reverting Visual Domain Limitation of Robotic Foundation Models | Sep 23, 2024 | Vision-Language-Action | —Unverified | 0 |
| Manipulation Facing Threats: Evaluating Physical Vulnerabilities in End-to-End Vision Language Action Models | Sep 20, 2024 | Vision-Language-Action | —Unverified | 0 |
| TinyVLA: Towards Fast, Data-Efficient Vision-Language-Action Models for Robotic Manipulation | Sep 19, 2024 | Vision-Language-Action | CodeCode Available | 2 |
| HiRT: Enhancing Robotic Control with Hierarchical Robot Transformers | Sep 12, 2024 | Vision-Language-Action | —Unverified | 0 |
| OccLLaMA: An Occupancy-Language-Action Generative World Model for Autonomous Driving | Sep 5, 2024 | Autonomous DrivingMotion Planning | —Unverified | 0 |
| CoVLA: Comprehensive Vision-Language-Action Dataset for Autonomous Driving | Aug 19, 2024 | Autonomous DrivingCaption Generation | —Unverified | 0 |
| Robotic Control via Embodied Chain-of-Thought Reasoning | Jul 11, 2024 | Vision-Language-Action | —Unverified | 0 |
| Mobility VLA: Multimodal Instruction Navigation with Long-Context VLMs and Topological Graphs | Jul 10, 2024 | Common Sense ReasoningVision-Language-Action | —Unverified | 0 |
| LLaRA: Supercharging Robot Learning Data for Vision-Language Policy | Jun 28, 2024 | Vision-Language-ActionWorld Knowledge | CodeCode Available | 3 |
| OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents | Jun 27, 2024 | DecoderImitation Learning | —Unverified | 0 |
| Towards Natural Language-Driven Assembly Using Foundation Models | Jun 23, 2024 | FrictionVision-Language-Action | —Unverified | 0 |
| OpenVLA: An Open-Source Vision-Language-Action Model | Jun 13, 2024 | Imitation LearningLanguage Modelling | CodeCode Available | 9 |
| RoboMamba: Efficient Vision-Language-Action Model for Robotic Reasoning and Manipulation | Jun 6, 2024 | Common Sense ReasoningMamba | —Unverified | 0 |
| Vision-Language Meets the Skeleton: Progressively Distillation with Cross-Modal Knowledge for 3D Action Representation Learning | May 31, 2024 | Action RecognitionContrastive Learning | CodeCode Available | 0 |
| A Survey on Vision-Language-Action Models for Embodied AI | May 23, 2024 | Image CaptioningInstruction Following | CodeCode Available | 4 |
| LEGENT: Open Platform for Embodied Agents | Apr 28, 2024 | Vision-Language-Action | —Unverified | 0 |