| CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation | Nov 29, 2024 | QuantizationVision-Language-Action | —Unverified | 0 |
| GRAPE: Generalizing Robot Policy via Preference Alignment | Nov 28, 2024 | Vision-Language-Action | —Unverified | 0 |
| ShowUI: One Vision-Language-Action Model for GUI Visual Agent | Nov 26, 2024 | Instruction FollowingNatural Language Visual Grounding | CodeCode Available | 5 |
| Exploring the Adversarial Vulnerabilities of Vision-Language-Action Models in Robotics | Nov 18, 2024 | Vision-Language-Action | CodeCode Available | 2 |
| Benchmarking Vision, Language, & Action Models on Robotic Learning Tasks | Nov 4, 2024 | Action GenerationBenchmarking | CodeCode Available | 1 |
| DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution | Nov 4, 2024 | GPURobot Manipulation | CodeCode Available | 2 |
| π_0: A Vision-Language-Action Flow Model for General Robot Control | Oct 31, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Diffusion Transformer Policy | Oct 21, 2024 | DenoisingVision-Language-Action | CodeCode Available | 2 |
| A Dual Process VLA: Efficient Robotic Manipulation Leveraging VLM | Oct 21, 2024 | Decision MakingVision-Language-Action | —Unverified | 0 |
| Vision-Language-Action Model and Diffusion Policy Switching Enables Dexterous Control of an Anthropomorphic Hand | Oct 17, 2024 | Vision-Language-Action | —Unverified | 0 |