SOTAVerified

Vision-Language-Action

Papers

Showing 101150 of 157 papers

TitleStatusHype
Probing a Vision-Language-Action Model for Symbolic States and Integration into a Cognitive Architecture0
VLA-Cache: Towards Efficient Vision-Language-Action Model via Adaptive Token Caching in Robotic Manipulation0
UP-VLA: A Unified Understanding and Prediction Model for Embodied Agent0
Improving Vision-Language-Action Model with Online Reinforcement Learning0
FAST: Efficient Action Tokenization for Vision-Language-Action Models0
UAV-VLA: Vision-Language-Action System for Large Scale Aerial Mission GenerationCode2
Beyond Sight: Finetuning Generalist Robot Policies with Heterogeneous Sensors via Language Grounding0
Large language models for artificial general intelligence (AGI): A survey of foundational principles and approaches0
Object-Centric Prompt-Driven Vision-Language-Action Model for Robotic Manipulation0
SOLAMI: Social Vision-Language-Action Modeling for Immersive Interaction with 3D Autonomous Characters0
VLABench: A Large-Scale Benchmark for Language-Conditioned Robotics Manipulation with Long-Horizon Reasoning Tasks0
QUART-Online: Latency-Free Large Multimodal Language Model for Quadruped Robot Learning0
Towards Generalist Robot Policies: What Matters in Building Vision-Language-Action ModelsCode3
RoboMIND: Benchmark on Multi-embodiment Intelligence Normative Data for Robot Manipulation0
Modality-Driven Design for Multi-Step Dexterous Manipulation: Insights from Neuroscience0
TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies0
Uni-NaVid: A Video-based Vision-Language-Action Model for Unifying Embodied Navigation Tasks0
NaVILA: Legged Robot Vision-Language-Action Model for Navigation0
Quantization-Aware Imitation-Learning for Resource-Efficient Robotic Control0
RoboMatrix: A Skill-centric Hierarchical Framework for Scalable Robot Task Planning and Execution in Open-WorldCode2
CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation0
GRAPE: Generalizing Robot Policy via Preference Alignment0
ShowUI: One Vision-Language-Action Model for GUI Visual AgentCode5
Exploring the Adversarial Vulnerabilities of Vision-Language-Action Models in RoboticsCode2
Benchmarking Vision, Language, & Action Models on Robotic Learning TasksCode1
DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot ExecutionCode2
π_0: A Vision-Language-Action Flow Model for General Robot Control0
Diffusion Transformer PolicyCode2
A Dual Process VLA: Efficient Robotic Manipulation Leveraging VLM0
Vision-Language-Action Model and Diffusion Policy Switching Enables Dexterous Control of an Anthropomorphic Hand0
Latent Action Pretraining from VideosCode3
Towards Synergistic, Generalized, and Efficient Dual-System for Robotic Manipulation0
LADEV: A Language-Driven Testing and Evaluation Platform for Vision-Language-Action Models in Robotic Manipulation0
Run-time Observation Interventions Make Vision-Language-Action Models More Visually Robust0
ReVLA: Reverting Visual Domain Limitation of Robotic Foundation Models0
Manipulation Facing Threats: Evaluating Physical Vulnerabilities in End-to-End Vision Language Action Models0
TinyVLA: Towards Fast, Data-Efficient Vision-Language-Action Models for Robotic ManipulationCode2
HiRT: Enhancing Robotic Control with Hierarchical Robot Transformers0
OccLLaMA: An Occupancy-Language-Action Generative World Model for Autonomous Driving0
CoVLA: Comprehensive Vision-Language-Action Dataset for Autonomous Driving0
Robotic Control via Embodied Chain-of-Thought Reasoning0
Mobility VLA: Multimodal Instruction Navigation with Long-Context VLMs and Topological Graphs0
LLaRA: Supercharging Robot Learning Data for Vision-Language PolicyCode3
OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents0
Towards Natural Language-Driven Assembly Using Foundation Models0
OpenVLA: An Open-Source Vision-Language-Action ModelCode9
RoboMamba: Efficient Vision-Language-Action Model for Robotic Reasoning and Manipulation0
Vision-Language Meets the Skeleton: Progressively Distillation with Cross-Modal Knowledge for 3D Action Representation LearningCode0
A Survey on Vision-Language-Action Models for Embodied AICode4
LEGENT: Open Platform for Embodied Agents0
Show:102550
← PrevPage 3 of 4Next →

No leaderboard results yet.