SOTAVerified

Vision-Language-Action

Papers

Showing 101150 of 157 papers

TitleStatusHype
MoManipVLA: Transferring Vision-language-action Models for General Mobile Manipulation0
ReBot: Scaling Robot Learning with Real-to-Sim-to-Real Robotic Video Synthesis0
HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Model0
MoRE: Unlocking Scalability in Reinforcement Learning for Quadruped Vision-Language-Action Models0
Refined Policy Distillation: From VLA Generalists to RL Experts0
OTTER: A Vision-Language-Action Model with Text-Aware Visual Feature Extraction0
SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Constrained Learning0
Accelerating Vision-Language-Action Model Integrated with Action Chunking via Parallel Decoding0
A Taxonomy for Evaluating Generalist Robot Policies0
DexGraspVLA: A Vision-Language-Action Framework Towards General Dexterous Grasping0
ObjectVLA: End-to-End Open-World Object Manipulation Without Demonstration0
Hi Robot: Open-Ended Instruction Following with Hierarchical Vision-Language-Action Models0
Evolution 6.0: Evolving Robotic Capabilities Through Generative Design0
GEVRM: Goal-Expressive Video Generation Model For Robust Visual Manipulation0
HAMSTER: Hierarchical Action Models For Open-World Robot Manipulation0
Survey on Vision-Language-Action Models0
Probing a Vision-Language-Action Model for Symbolic States and Integration into a Cognitive Architecture0
VLA-Cache: Towards Efficient Vision-Language-Action Model via Adaptive Token Caching in Robotic Manipulation0
UP-VLA: A Unified Understanding and Prediction Model for Embodied Agent0
Improving Vision-Language-Action Model with Online Reinforcement Learning0
FAST: Efficient Action Tokenization for Vision-Language-Action Models0
Beyond Sight: Finetuning Generalist Robot Policies with Heterogeneous Sensors via Language Grounding0
Large language models for artificial general intelligence (AGI): A survey of foundational principles and approaches0
Object-Centric Prompt-Driven Vision-Language-Action Model for Robotic Manipulation0
SOLAMI: Social Vision-Language-Action Modeling for Immersive Interaction with 3D Autonomous Characters0
VLABench: A Large-Scale Benchmark for Language-Conditioned Robotics Manipulation with Long-Horizon Reasoning Tasks0
QUART-Online: Latency-Free Large Multimodal Language Model for Quadruped Robot Learning0
RoboMIND: Benchmark on Multi-embodiment Intelligence Normative Data for Robot Manipulation0
Modality-Driven Design for Multi-Step Dexterous Manipulation: Insights from Neuroscience0
TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies0
Uni-NaVid: A Video-based Vision-Language-Action Model for Unifying Embodied Navigation Tasks0
NaVILA: Legged Robot Vision-Language-Action Model for Navigation0
Quantization-Aware Imitation-Learning for Resource-Efficient Robotic Control0
CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation0
GRAPE: Generalizing Robot Policy via Preference Alignment0
π_0: A Vision-Language-Action Flow Model for General Robot Control0
A Dual Process VLA: Efficient Robotic Manipulation Leveraging VLM0
Vision-Language-Action Model and Diffusion Policy Switching Enables Dexterous Control of an Anthropomorphic Hand0
Towards Synergistic, Generalized, and Efficient Dual-System for Robotic Manipulation0
LADEV: A Language-Driven Testing and Evaluation Platform for Vision-Language-Action Models in Robotic Manipulation0
Run-time Observation Interventions Make Vision-Language-Action Models More Visually Robust0
ReVLA: Reverting Visual Domain Limitation of Robotic Foundation Models0
Manipulation Facing Threats: Evaluating Physical Vulnerabilities in End-to-End Vision Language Action Models0
HiRT: Enhancing Robotic Control with Hierarchical Robot Transformers0
OccLLaMA: An Occupancy-Language-Action Generative World Model for Autonomous Driving0
CoVLA: Comprehensive Vision-Language-Action Dataset for Autonomous Driving0
Robotic Control via Embodied Chain-of-Thought Reasoning0
Mobility VLA: Multimodal Instruction Navigation with Long-Context VLMs and Topological Graphs0
OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents0
Towards Natural Language-Driven Assembly Using Foundation Models0
Show:102550
← PrevPage 3 of 4Next →

No leaderboard results yet.