| MoManipVLA: Transferring Vision-language-action Models for General Mobile Manipulation | Mar 17, 2025 | Motion PlanningVision-Language-Action | —Unverified | 0 |
| ReBot: Scaling Robot Learning with Real-to-Sim-to-Real Robotic Video Synthesis | Mar 15, 2025 | Domain GeneralizationRobot Manipulation | —Unverified | 0 |
| HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Model | Mar 13, 2025 | Common Sense ReasoningDenoising | —Unverified | 0 |
| MoRE: Unlocking Scalability in Reinforcement Learning for Quadruped Vision-Language-Action Models | Mar 11, 2025 | Large Language ModelMixture-of-Experts | —Unverified | 0 |
| Refined Policy Distillation: From VLA Generalists to RL Experts | Mar 6, 2025 | Vision-Language-Action | —Unverified | 0 |
| OTTER: A Vision-Language-Action Model with Text-Aware Visual Feature Extraction | Mar 5, 2025 | Vision-Language-ActionZero-shot Generalization | —Unverified | 0 |
| SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Constrained Learning | Mar 5, 2025 | Safe Reinforcement LearningSafety Alignment | —Unverified | 0 |
| Accelerating Vision-Language-Action Model Integrated with Action Chunking via Parallel Decoding | Mar 4, 2025 | ChunkingVision-Language-Action | —Unverified | 0 |
| A Taxonomy for Evaluating Generalist Robot Policies | Mar 3, 2025 | Robot ManipulationVision-Language-Action | —Unverified | 0 |
| DexGraspVLA: A Vision-Language-Action Framework Towards General Dexterous Grasping | Feb 28, 2025 | Imitation LearningVision-Language-Action | —Unverified | 0 |
| ObjectVLA: End-to-End Open-World Object Manipulation Without Demonstration | Feb 26, 2025 | Imitation LearningObject | —Unverified | 0 |
| Hi Robot: Open-Ended Instruction Following with Hierarchical Vision-Language-Action Models | Feb 26, 2025 | Instruction FollowingVision-Language-Action | —Unverified | 0 |
| Evolution 6.0: Evolving Robotic Capabilities Through Generative Design | Feb 24, 2025 | Action GenerationText to 3D | —Unverified | 0 |
| GEVRM: Goal-Expressive Video Generation Model For Robust Visual Manipulation | Feb 13, 2025 | Contrastive LearningVideo Generation | —Unverified | 0 |
| HAMSTER: Hierarchical Action Models For Open-World Robot Manipulation | Feb 8, 2025 | Robot ManipulationVision-Language-Action | —Unverified | 0 |
| Survey on Vision-Language-Action Models | Feb 7, 2025 | Review GenerationSurvey | —Unverified | 0 |
| Probing a Vision-Language-Action Model for Symbolic States and Integration into a Cognitive Architecture | Feb 6, 2025 | ObjectVision-Language-Action | —Unverified | 0 |
| VLA-Cache: Towards Efficient Vision-Language-Action Model via Adaptive Token Caching in Robotic Manipulation | Feb 4, 2025 | Decision MakingSequential Decision Making | —Unverified | 0 |
| UP-VLA: A Unified Understanding and Prediction Model for Embodied Agent | Jan 31, 2025 | Robot ManipulationVision-Language-Action | —Unverified | 0 |
| Improving Vision-Language-Action Model with Online Reinforcement Learning | Jan 28, 2025 | reinforcement-learningReinforcement Learning | —Unverified | 0 |
| FAST: Efficient Action Tokenization for Vision-Language-Action Models | Jan 16, 2025 | Vision-Language-Action | —Unverified | 0 |
| Beyond Sight: Finetuning Generalist Robot Policies with Heterogeneous Sensors via Language Grounding | Jan 8, 2025 | Robot ManipulationText Generation | —Unverified | 0 |
| Large language models for artificial general intelligence (AGI): A survey of foundational principles and approaches | Jan 6, 2025 | Vision-Language-Action | —Unverified | 0 |
| Object-Centric Prompt-Driven Vision-Language-Action Model for Robotic Manipulation | Jan 1, 2025 | Vision-Language-Action | —Unverified | 0 |
| SOLAMI: Social Vision-Language-Action Modeling for Immersive Interaction with 3D Autonomous Characters | Jan 1, 2025 | Vision-Language-Action | —Unverified | 0 |
| VLABench: A Large-Scale Benchmark for Language-Conditioned Robotics Manipulation with Long-Horizon Reasoning Tasks | Dec 24, 2024 | Common Sense ReasoningTransfer Learning | —Unverified | 0 |
| QUART-Online: Latency-Free Large Multimodal Language Model for Quadruped Robot Learning | Dec 20, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| RoboMIND: Benchmark on Multi-embodiment Intelligence Normative Data for Robot Manipulation | Dec 18, 2024 | DiversityImitation Learning | —Unverified | 0 |
| Modality-Driven Design for Multi-Step Dexterous Manipulation: Insights from Neuroscience | Dec 15, 2024 | Vision-Language-Action | —Unverified | 0 |
| TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies | Dec 13, 2024 | Robot ManipulationVision-Language-Action | —Unverified | 0 |
| Uni-NaVid: A Video-based Vision-Language-Action Model for Unifying Embodied Navigation Tasks | Dec 9, 2024 | Vision-Language-Action | —Unverified | 0 |
| NaVILA: Legged Robot Vision-Language-Action Model for Navigation | Dec 5, 2024 | NavigateVision and Language Navigation | —Unverified | 0 |
| Quantization-Aware Imitation-Learning for Resource-Efficient Robotic Control | Dec 2, 2024 | Autonomous DrivingDecision Making | —Unverified | 0 |
| CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation | Nov 29, 2024 | QuantizationVision-Language-Action | —Unverified | 0 |
| GRAPE: Generalizing Robot Policy via Preference Alignment | Nov 28, 2024 | Vision-Language-Action | —Unverified | 0 |
| π_0: A Vision-Language-Action Flow Model for General Robot Control | Oct 31, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| A Dual Process VLA: Efficient Robotic Manipulation Leveraging VLM | Oct 21, 2024 | Decision MakingVision-Language-Action | —Unverified | 0 |
| Vision-Language-Action Model and Diffusion Policy Switching Enables Dexterous Control of an Anthropomorphic Hand | Oct 17, 2024 | Vision-Language-Action | —Unverified | 0 |
| Towards Synergistic, Generalized, and Efficient Dual-System for Robotic Manipulation | Oct 10, 2024 | Robot ManipulationVision-Language-Action | —Unverified | 0 |
| LADEV: A Language-Driven Testing and Evaluation Platform for Vision-Language-Action Models in Robotic Manipulation | Oct 7, 2024 | Vision-Language-Action | —Unverified | 0 |
| Run-time Observation Interventions Make Vision-Language-Action Models More Visually Robust | Oct 2, 2024 | Vision-Language-Action | —Unverified | 0 |
| ReVLA: Reverting Visual Domain Limitation of Robotic Foundation Models | Sep 23, 2024 | Vision-Language-Action | —Unverified | 0 |
| Manipulation Facing Threats: Evaluating Physical Vulnerabilities in End-to-End Vision Language Action Models | Sep 20, 2024 | Vision-Language-Action | —Unverified | 0 |
| HiRT: Enhancing Robotic Control with Hierarchical Robot Transformers | Sep 12, 2024 | Vision-Language-Action | —Unverified | 0 |
| OccLLaMA: An Occupancy-Language-Action Generative World Model for Autonomous Driving | Sep 5, 2024 | Autonomous DrivingMotion Planning | —Unverified | 0 |
| CoVLA: Comprehensive Vision-Language-Action Dataset for Autonomous Driving | Aug 19, 2024 | Autonomous DrivingCaption Generation | —Unverified | 0 |
| Robotic Control via Embodied Chain-of-Thought Reasoning | Jul 11, 2024 | Vision-Language-Action | —Unverified | 0 |
| Mobility VLA: Multimodal Instruction Navigation with Long-Context VLMs and Topological Graphs | Jul 10, 2024 | Common Sense ReasoningVision-Language-Action | —Unverified | 0 |
| OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents | Jun 27, 2024 | DecoderImitation Learning | —Unverified | 0 |
| Towards Natural Language-Driven Assembly Using Foundation Models | Jun 23, 2024 | FrictionVision-Language-Action | —Unverified | 0 |