| Bridging Language, Vision and Action: Multimodal VAEs in Robotic Manipulation Tasks | Apr 2, 2024 | Vision-Language-Action | CodeCode Available | 1 |
| 3D-VLA: A 3D Vision-Language-Action Generative World Model | Mar 14, 2024 | Language ModellingLarge Language Model | —Unverified | 0 |
| General-purpose foundation models for increased autonomy in robot-assisted surgery | Jan 1, 2024 | Vision-Language-Action | —Unverified | 0 |
| QUAR-VLA: Vision-Language-Action Model for Quadruped Robots | Dec 22, 2023 | Decision MakingVision-Language-Action | —Unverified | 0 |
| SARA-RT: Scaling up Robotics Transformers with Self-Adaptive Robust Attention | Dec 4, 2023 | Vision-Language-Action | —Unverified | 0 |
| An Embodied Generalist Agent in 3D World | Nov 18, 2023 | 3D dense captioning3D Question Answering (3D-QA) | CodeCode Available | 2 |
| RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control | Jul 28, 2023 | ObjectQuestion Answering | CodeCode Available | 2 |