| DataPlatter: Boosting Robotic Manipulation Generalization with Minimal Costly Data | Mar 25, 2025 | Robot ManipulationSpatial Reasoning | —Unverified | 0 |
| Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy | Mar 25, 2025 | DenoisingRobot Manipulation | CodeCode Available | 2 |
| GR00T N1: An Open Foundation Model for Generalist Humanoid Robots | Mar 18, 2025 | Imitation LearningVision-Language-Action | —Unverified | 0 |
| MoManipVLA: Transferring Vision-language-action Models for General Mobile Manipulation | Mar 17, 2025 | Motion PlanningVision-Language-Action | —Unverified | 0 |
| ReBot: Scaling Robot Learning with Real-to-Sim-to-Real Robotic Video Synthesis | Mar 15, 2025 | Domain GeneralizationRobot Manipulation | —Unverified | 0 |
| HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Model | Mar 13, 2025 | Common Sense ReasoningDenoising | —Unverified | 0 |
| CombatVLA: An Efficient Vision-Language-Action Model for Combat Tasks in 3D Action Role-Playing Games | Mar 12, 2025 | Decision MakingVision-Language-Action | CodeCode Available | 2 |
| MoRE: Unlocking Scalability in Reinforcement Learning for Quadruped Vision-Language-Action Models | Mar 11, 2025 | Large Language ModelMixture-of-Experts | —Unverified | 0 |
| PointVLA: Injecting the 3D World into Vision-Language-Action Models | Mar 10, 2025 | Imitation LearningSpatial Reasoning | CodeCode Available | 4 |
| Refined Policy Distillation: From VLA Generalists to RL Experts | Mar 6, 2025 | Vision-Language-Action | —Unverified | 0 |
| SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Constrained Learning | Mar 5, 2025 | Safe Reinforcement LearningSafety Alignment | —Unverified | 0 |
| OTTER: A Vision-Language-Action Model with Text-Aware Visual Feature Extraction | Mar 5, 2025 | Vision-Language-ActionZero-shot Generalization | —Unverified | 0 |
| Accelerating Vision-Language-Action Model Integrated with Action Chunking via Parallel Decoding | Mar 4, 2025 | ChunkingVision-Language-Action | —Unverified | 0 |
| A Taxonomy for Evaluating Generalist Robot Policies | Mar 3, 2025 | Robot ManipulationVision-Language-Action | —Unverified | 0 |
| DexGraspVLA: A Vision-Language-Action Framework Towards General Dexterous Grasping | Feb 28, 2025 | Imitation LearningVision-Language-Action | —Unverified | 0 |
| Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success | Feb 27, 2025 | Action GenerationChunking | CodeCode Available | 5 |
| ObjectVLA: End-to-End Open-World Object Manipulation Without Demonstration | Feb 26, 2025 | Imitation LearningObject | —Unverified | 0 |
| Hi Robot: Open-Ended Instruction Following with Hierarchical Vision-Language-Action Models | Feb 26, 2025 | Instruction FollowingVision-Language-Action | —Unverified | 0 |
| Evolution 6.0: Evolving Robotic Capabilities Through Generative Design | Feb 24, 2025 | Action GenerationText to 3D | —Unverified | 0 |
| ChatVLA: Unified Multimodal Understanding and Robot Control with Vision-Language-Action Model | Feb 20, 2025 | Mixture-of-ExpertsQuestion Answering | CodeCode Available | 1 |
| GEVRM: Goal-Expressive Video Generation Model For Robust Visual Manipulation | Feb 13, 2025 | Contrastive LearningVideo Generation | —Unverified | 0 |
| DexVLA: Vision-Language Model with Plug-In Diffusion Expert for General Robot Control | Feb 9, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| ConRFT: A Reinforced Fine-tuning Method for VLA Models via Consistency Policy | Feb 8, 2025 | Q-LearningSafe Exploration | CodeCode Available | 3 |
| HAMSTER: Hierarchical Action Models For Open-World Robot Manipulation | Feb 8, 2025 | Robot ManipulationVision-Language-Action | —Unverified | 0 |
| Survey on Vision-Language-Action Models | Feb 7, 2025 | Review GenerationSurvey | —Unverified | 0 |