SOTAVerified

Vision-Language-Action

Papers

Showing 101150 of 157 papers

TitleStatusHype
UP-VLA: A Unified Understanding and Prediction Model for Embodied Agent0
Vision-Language-Action Model and Diffusion Policy Switching Enables Dexterous Control of an Anthropomorphic Hand0
Vision-Language-Action Models: Concepts, Progress, Applications and Challenges0
VLABench: A Large-Scale Benchmark for Language-Conditioned Robotics Manipulation with Long-Horizon Reasoning Tasks0
VLA-Cache: Towards Efficient Vision-Language-Action Model via Adaptive Token Caching in Robotic Manipulation0
VLA-OS: Structuring and Dissecting Planning Representations and Paradigms in Vision-Language-Action Models0
What Can RL Bring to VLA Generalization? An Empirical Study0
Towards Synergistic, Generalized, and Efficient Dual-System for Robotic Manipulation0
Hybrid Reasoning for Perception, Explanation, and Autonomous Action in Manufacturing0
FreqPolicy: Efficient Flow-based Visuomotor Policy via Frequency Consistency0
3D CAVLA: Leveraging Depth and 3D Context to Generalize Vision Language Action Models for Unseen Tasks0
3D-VLA: A 3D Vision-Language-Action Generative World Model0
Accelerating Vision-Language-Action Model Integrated with Action Chunking via Parallel Decoding0
A Dual Process VLA: Efficient Robotic Manipulation Leveraging VLM0
An Open-Source Software Toolkit & Benchmark Suite for the Evaluation and Adaptation of Multimodal Action Models0
AnyPos: Automated Task-Agnostic Actions for Bimanual Manipulation0
Automated Data Curation Using GPS & NLP to Generate Instruction-Action Pairs for Autonomous Vehicle Vision-Language Navigation Datasets0
BadVLA: Towards Backdoor Attacks on Vision-Language-Action Models via Objective-Decoupled Optimization0
Beyond Sight: Finetuning Generalist Robot Policies with Heterogeneous Sensors via Language Grounding0
Block-wise Adaptive Caching for Accelerating Diffusion Policy0
BridgeVLA: Input-Output Alignment for Efficient 3D Manipulation Learning with Vision-Language Models0
CapsDT: Diffusion-Transformer for Capsule Robot Manipulation0
CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation0
Conditioning Matters: Training Diffusion Policies is Faster Than You Think0
CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models0
CoVLA: Comprehensive Vision-Language-Action Dataset for Autonomous Driving0
CronusVLA: Transferring Latent Motion Across Time for Multi-Frame Prediction in Manipulation0
DataPlatter: Boosting Robotic Manipulation Generalization with Minimal Costly Data0
DexGraspVLA: A Vision-Language-Action Framework Towards General Dexterous Grasping0
DriveAction: A Benchmark for Exploring Human-like Driving Decisions in VLA Models0
DriveMoE: Mixture-of-Experts for Vision-Language-Action Model in End-to-End Autonomous Driving0
EfficientVLA: Training-Free Acceleration and Compression for Vision-Language-Action Models0
Embodied AI with Foundation Models for Mobile Service Robots: A Systematic Review0
EndoVLA: Dual-Phase Vision-Language-Action Model for Autonomous Tracking in Endoscopy0
Evolution 6.0: Evolving Robotic Capabilities Through Generative Design0
FAST: Efficient Action Tokenization for Vision-Language-Action Models0
FLARE: Robot Learning with Implicit World Modeling0
ForceVLA: Enhancing VLA Models with a Force-aware MoE for Contact-rich Manipulation0
From Intention to Execution: Probing the Generalization Boundaries of Vision-Language-Action Models0
General-purpose foundation models for increased autonomy in robot-assisted surgery0
GEVRM: Goal-Expressive Video Generation Model For Robust Visual Manipulation0
GR00T N1: An Open Foundation Model for Generalist Humanoid Robots0
GRAPE: Generalizing Robot Policy via Preference Alignment0
Grounding Multimodal LLMs to Embodied Agents that Ask for Help with Reinforcement Learning0
HAMSTER: Hierarchical Action Models For Open-World Robot Manipulation0
Hi Robot: Open-Ended Instruction Following with Hierarchical Vision-Language-Action Models0
HiRT: Enhancing Robotic Control with Hierarchical Robot Transformers0
Hume: Introducing System-2 Thinking in Visual-Language-Action Model0
HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Model0
Improving Vision-Language-Action Model with Online Reinforcement Learning0
Show:102550
← PrevPage 3 of 4Next →

No leaderboard results yet.