SOTAVerified

Vision-Language-Action

Papers

Showing 101125 of 157 papers

TitleStatusHype
Probing a Vision-Language-Action Model for Symbolic States and Integration into a Cognitive Architecture0
VLA-Cache: Towards Efficient Vision-Language-Action Model via Adaptive Token Caching in Robotic Manipulation0
UP-VLA: A Unified Understanding and Prediction Model for Embodied Agent0
Improving Vision-Language-Action Model with Online Reinforcement Learning0
FAST: Efficient Action Tokenization for Vision-Language-Action Models0
UAV-VLA: Vision-Language-Action System for Large Scale Aerial Mission GenerationCode2
Beyond Sight: Finetuning Generalist Robot Policies with Heterogeneous Sensors via Language Grounding0
Large language models for artificial general intelligence (AGI): A survey of foundational principles and approaches0
Object-Centric Prompt-Driven Vision-Language-Action Model for Robotic Manipulation0
SOLAMI: Social Vision-Language-Action Modeling for Immersive Interaction with 3D Autonomous Characters0
VLABench: A Large-Scale Benchmark for Language-Conditioned Robotics Manipulation with Long-Horizon Reasoning Tasks0
QUART-Online: Latency-Free Large Multimodal Language Model for Quadruped Robot Learning0
Towards Generalist Robot Policies: What Matters in Building Vision-Language-Action ModelsCode3
RoboMIND: Benchmark on Multi-embodiment Intelligence Normative Data for Robot Manipulation0
Modality-Driven Design for Multi-Step Dexterous Manipulation: Insights from Neuroscience0
TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies0
Uni-NaVid: A Video-based Vision-Language-Action Model for Unifying Embodied Navigation Tasks0
NaVILA: Legged Robot Vision-Language-Action Model for Navigation0
Quantization-Aware Imitation-Learning for Resource-Efficient Robotic Control0
RoboMatrix: A Skill-centric Hierarchical Framework for Scalable Robot Task Planning and Execution in Open-WorldCode2
CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation0
GRAPE: Generalizing Robot Policy via Preference Alignment0
ShowUI: One Vision-Language-Action Model for GUI Visual AgentCode5
Exploring the Adversarial Vulnerabilities of Vision-Language-Action Models in RoboticsCode2
Benchmarking Vision, Language, & Action Models on Robotic Learning TasksCode1
Show:102550
← PrevPage 5 of 7Next →

No leaderboard results yet.