SOTAVerified

Vision-Language-Action

Papers

Showing 126150 of 157 papers

TitleStatusHype
VLABench: A Large-Scale Benchmark for Language-Conditioned Robotics Manipulation with Long-Horizon Reasoning Tasks0
QUART-Online: Latency-Free Large Multimodal Language Model for Quadruped Robot Learning0
RoboMIND: Benchmark on Multi-embodiment Intelligence Normative Data for Robot Manipulation0
Modality-Driven Design for Multi-Step Dexterous Manipulation: Insights from Neuroscience0
TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies0
Uni-NaVid: A Video-based Vision-Language-Action Model for Unifying Embodied Navigation Tasks0
NaVILA: Legged Robot Vision-Language-Action Model for Navigation0
Quantization-Aware Imitation-Learning for Resource-Efficient Robotic Control0
CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation0
GRAPE: Generalizing Robot Policy via Preference Alignment0
π_0: A Vision-Language-Action Flow Model for General Robot Control0
A Dual Process VLA: Efficient Robotic Manipulation Leveraging VLM0
Vision-Language-Action Model and Diffusion Policy Switching Enables Dexterous Control of an Anthropomorphic Hand0
Towards Synergistic, Generalized, and Efficient Dual-System for Robotic Manipulation0
LADEV: A Language-Driven Testing and Evaluation Platform for Vision-Language-Action Models in Robotic Manipulation0
Run-time Observation Interventions Make Vision-Language-Action Models More Visually Robust0
ReVLA: Reverting Visual Domain Limitation of Robotic Foundation Models0
Manipulation Facing Threats: Evaluating Physical Vulnerabilities in End-to-End Vision Language Action Models0
HiRT: Enhancing Robotic Control with Hierarchical Robot Transformers0
OccLLaMA: An Occupancy-Language-Action Generative World Model for Autonomous Driving0
CoVLA: Comprehensive Vision-Language-Action Dataset for Autonomous Driving0
Robotic Control via Embodied Chain-of-Thought Reasoning0
Mobility VLA: Multimodal Instruction Navigation with Long-Context VLMs and Topological Graphs0
OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents0
Towards Natural Language-Driven Assembly Using Foundation Models0
Show:102550
← PrevPage 6 of 7Next →

No leaderboard results yet.