| Interactive Post-Training for Vision-Language-Action Models | May 22, 2025 | Vision-Language-Action | —Unverified | 0 |
| BadVLA: Towards Backdoor Attacks on Vision-Language-Action Models via Objective-Decoupled Optimization | May 22, 2025 | Backdoor AttackVision-Language-Action | —Unverified | 0 |
| FLARE: Robot Learning with Implicit World Modeling | May 21, 2025 | Imitation LearningVision-Language-Action | —Unverified | 0 |
| Exploring the Limits of Vision-Language-Action Manipulations in Cross-task Generalization | May 21, 2025 | Vision-Language-ActionZero-shot Generalization | CodeCode Available | 2 |
| EndoVLA: Dual-Phase Vision-Language-Action Model for Autonomous Tracking in Endoscopy | May 21, 2025 | Motion PlanningVision-Language-Action | —Unverified | 0 |
| Object-Focus Actor for Data-efficient Robot Generalization Dexterous Manipulation | May 21, 2025 | ObjectPose Estimation | —Unverified | 0 |
| RoboFAC: A Comprehensive Framework for Robotic Failure Analysis and Correction | May 18, 2025 | Vision-Language-Action | CodeCode Available | 1 |
| Conditioning Matters: Training Diffusion Policies is Faster Than You Think | May 16, 2025 | Vision-Language-Action | —Unverified | 0 |
| RT-cache: Efficient Robot Trajectory Retrieval System | May 14, 2025 | RetrievalVision-Language-Action | —Unverified | 0 |
| From Seeing to Doing: Bridging Reasoning and Decision for Robotic Manipulation | May 13, 2025 | Robot ManipulationSpatial Reasoning | CodeCode Available | 1 |
| Pixel Motion as Universal Representation for Robot Control | May 12, 2025 | Vision-Language-Action | —Unverified | 0 |
| 3D CAVLA: Leveraging Depth and 3D Context to Generalize Vision Language Action Models for Unseen Tasks | May 9, 2025 | Vision-Language-Action | —Unverified | 0 |
| UniVLA: Learning to Act Anywhere with Task-centric Latent Actions | May 9, 2025 | Robot ManipulationVision-Language-Action | CodeCode Available | 5 |
| Benchmarking Vision, Language, & Action Models in Procedurally Generated, Open Ended Action Environments | May 8, 2025 | BenchmarkingPrompt Engineering | CodeCode Available | 1 |
| Vision-Language-Action Models: Concepts, Progress, Applications and Challenges | May 7, 2025 | Autonomous VehiclesNatural Language Understanding | —Unverified | 0 |
| OpenHelix: A Short Survey, Empirical Analysis, and Open-Source Dual-System VLA Model for Robotic Manipulation | May 6, 2025 | Robot ManipulationVision-Language-Action | CodeCode Available | 3 |
| Automated Data Curation Using GPS & NLP to Generate Instruction-Action Pairs for Autonomous Vehicle Vision-Language Navigation Datasets | May 6, 2025 | Autonomous VehiclesTAG | —Unverified | 0 |
| NORA: A Small Open-Sourced Generalist Vision Language Action Model for Embodied Tasks | Apr 28, 2025 | Task PlanningVision-Language-Action | —Unverified | 0 |
| π_0.5: a Vision-Language-Action Model with Open-World Generalization | Apr 22, 2025 | Transfer LearningVision-Language-Action | —Unverified | 0 |
| GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents | Apr 14, 2025 | Vision-Language-Action | CodeCode Available | 3 |
| OPAL: Encoding Causal Understanding of Physical Systems for Robot Learning | Apr 9, 2025 | Vision-Language-Action | —Unverified | 0 |
| Grounding Multimodal LLMs to Embodied Agents that Ask for Help with Reinforcement Learning | Apr 1, 2025 | Reinforcement Learning (RL)Vision-Language-Action | —Unverified | 0 |
| OpenDriveVLA: Towards End-to-end Autonomous Driving with Large Vision Language Action Model | Mar 30, 2025 | Autonomous DrivingDecision Making | CodeCode Available | 4 |
| CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models | Mar 27, 2025 | Vision-Language-Action | —Unverified | 0 |
| MoLe-VLA: Dynamic Layer-skipping Vision Language Action Model via Mixture-of-Layers for Efficient Robot Manipulation | Mar 26, 2025 | Knowledge DistillationMixture-of-Experts | —Unverified | 0 |