| Probing a Vision-Language-Action Model for Symbolic States and Integration into a Cognitive Architecture | Feb 6, 2025 | ObjectVision-Language-Action | —Unverified | 0 |
| VLA-Cache: Towards Efficient Vision-Language-Action Model via Adaptive Token Caching in Robotic Manipulation | Feb 4, 2025 | Decision MakingSequential Decision Making | —Unverified | 0 |
| UP-VLA: A Unified Understanding and Prediction Model for Embodied Agent | Jan 31, 2025 | Robot ManipulationVision-Language-Action | —Unverified | 0 |
| Improving Vision-Language-Action Model with Online Reinforcement Learning | Jan 28, 2025 | reinforcement-learningReinforcement Learning | —Unverified | 0 |
| FAST: Efficient Action Tokenization for Vision-Language-Action Models | Jan 16, 2025 | Vision-Language-Action | —Unverified | 0 |
| UAV-VLA: Vision-Language-Action System for Large Scale Aerial Mission Generation | Jan 9, 2025 | Decision MakingLanguage Modeling | CodeCode Available | 2 |
| Beyond Sight: Finetuning Generalist Robot Policies with Heterogeneous Sensors via Language Grounding | Jan 8, 2025 | Robot ManipulationText Generation | —Unverified | 0 |
| Large language models for artificial general intelligence (AGI): A survey of foundational principles and approaches | Jan 6, 2025 | Vision-Language-Action | —Unverified | 0 |
| Object-Centric Prompt-Driven Vision-Language-Action Model for Robotic Manipulation | Jan 1, 2025 | Vision-Language-Action | —Unverified | 0 |
| SOLAMI: Social Vision-Language-Action Modeling for Immersive Interaction with 3D Autonomous Characters | Jan 1, 2025 | Vision-Language-Action | —Unverified | 0 |
| VLABench: A Large-Scale Benchmark for Language-Conditioned Robotics Manipulation with Long-Horizon Reasoning Tasks | Dec 24, 2024 | Common Sense ReasoningTransfer Learning | —Unverified | 0 |
| QUART-Online: Latency-Free Large Multimodal Language Model for Quadruped Robot Learning | Dec 20, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Towards Generalist Robot Policies: What Matters in Building Vision-Language-Action Models | Dec 18, 2024 | Representation LearningRobot Manipulation | CodeCode Available | 3 |
| RoboMIND: Benchmark on Multi-embodiment Intelligence Normative Data for Robot Manipulation | Dec 18, 2024 | DiversityImitation Learning | —Unverified | 0 |
| Modality-Driven Design for Multi-Step Dexterous Manipulation: Insights from Neuroscience | Dec 15, 2024 | Vision-Language-Action | —Unverified | 0 |
| TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies | Dec 13, 2024 | Robot ManipulationVision-Language-Action | —Unverified | 0 |
| Uni-NaVid: A Video-based Vision-Language-Action Model for Unifying Embodied Navigation Tasks | Dec 9, 2024 | Vision-Language-Action | —Unverified | 0 |
| NaVILA: Legged Robot Vision-Language-Action Model for Navigation | Dec 5, 2024 | NavigateVision and Language Navigation | —Unverified | 0 |
| Quantization-Aware Imitation-Learning for Resource-Efficient Robotic Control | Dec 2, 2024 | Autonomous DrivingDecision Making | —Unverified | 0 |
| RoboMatrix: A Skill-centric Hierarchical Framework for Scalable Robot Task Planning and Execution in Open-World | Nov 29, 2024 | Robot Task PlanningScheduling | CodeCode Available | 2 |
| CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation | Nov 29, 2024 | QuantizationVision-Language-Action | —Unverified | 0 |
| GRAPE: Generalizing Robot Policy via Preference Alignment | Nov 28, 2024 | Vision-Language-Action | —Unverified | 0 |
| ShowUI: One Vision-Language-Action Model for GUI Visual Agent | Nov 26, 2024 | Instruction FollowingNatural Language Visual Grounding | CodeCode Available | 5 |
| Exploring the Adversarial Vulnerabilities of Vision-Language-Action Models in Robotics | Nov 18, 2024 | Vision-Language-Action | CodeCode Available | 2 |
| Benchmarking Vision, Language, & Action Models on Robotic Learning Tasks | Nov 4, 2024 | Action GenerationBenchmarking | CodeCode Available | 1 |