SOTAVerified

Vision-Language-Action

Papers

Showing 51100 of 157 papers

TitleStatusHype
BadVLA: Towards Backdoor Attacks on Vision-Language-Action Models via Objective-Decoupled Optimization0
Perceptual Quality Assessment for Embodied AICode0
Object-Focus Actor for Data-efficient Robot Generalization Dexterous Manipulation0
Exploring the Limits of Vision-Language-Action Manipulations in Cross-task GeneralizationCode2
EndoVLA: Dual-Phase Vision-Language-Action Model for Autonomous Tracking in Endoscopy0
FLARE: Robot Learning with Implicit World Modeling0
RoboFAC: A Comprehensive Framework for Robotic Failure Analysis and CorrectionCode1
Conditioning Matters: Training Diffusion Policies is Faster Than You Think0
RT-cache: Efficient Robot Trajectory Retrieval System0
From Seeing to Doing: Bridging Reasoning and Decision for Robotic ManipulationCode1
Pixel Motion as Universal Representation for Robot Control0
3D CAVLA: Leveraging Depth and 3D Context to Generalize Vision Language Action Models for Unseen Tasks0
UniVLA: Learning to Act Anywhere with Task-centric Latent ActionsCode5
Benchmarking Vision, Language, & Action Models in Procedurally Generated, Open Ended Action EnvironmentsCode1
Vision-Language-Action Models: Concepts, Progress, Applications and Challenges0
OpenHelix: A Short Survey, Empirical Analysis, and Open-Source Dual-System VLA Model for Robotic ManipulationCode3
Automated Data Curation Using GPS & NLP to Generate Instruction-Action Pairs for Autonomous Vehicle Vision-Language Navigation Datasets0
NORA: A Small Open-Sourced Generalist Vision Language Action Model for Embodied Tasks0
π_0.5: a Vision-Language-Action Model with Open-World Generalization0
GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI AgentsCode3
OPAL: Encoding Causal Understanding of Physical Systems for Robot Learning0
Grounding Multimodal LLMs to Embodied Agents that Ask for Help with Reinforcement Learning0
OpenDriveVLA: Towards End-to-end Autonomous Driving with Large Vision Language Action ModelCode4
CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models0
MoLe-VLA: Dynamic Layer-skipping Vision Language Action Model via Mixture-of-Layers for Efficient Robot Manipulation0
DataPlatter: Boosting Robotic Manipulation Generalization with Minimal Costly Data0
Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action PolicyCode2
GR00T N1: An Open Foundation Model for Generalist Humanoid Robots0
MoManipVLA: Transferring Vision-language-action Models for General Mobile Manipulation0
ReBot: Scaling Robot Learning with Real-to-Sim-to-Real Robotic Video Synthesis0
HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Model0
CombatVLA: An Efficient Vision-Language-Action Model for Combat Tasks in 3D Action Role-Playing GamesCode2
MoRE: Unlocking Scalability in Reinforcement Learning for Quadruped Vision-Language-Action Models0
PointVLA: Injecting the 3D World into Vision-Language-Action ModelsCode4
Refined Policy Distillation: From VLA Generalists to RL Experts0
SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Constrained Learning0
OTTER: A Vision-Language-Action Model with Text-Aware Visual Feature Extraction0
Accelerating Vision-Language-Action Model Integrated with Action Chunking via Parallel Decoding0
A Taxonomy for Evaluating Generalist Robot Policies0
DexGraspVLA: A Vision-Language-Action Framework Towards General Dexterous Grasping0
Fine-Tuning Vision-Language-Action Models: Optimizing Speed and SuccessCode5
ObjectVLA: End-to-End Open-World Object Manipulation Without Demonstration0
Hi Robot: Open-Ended Instruction Following with Hierarchical Vision-Language-Action Models0
Evolution 6.0: Evolving Robotic Capabilities Through Generative Design0
ChatVLA: Unified Multimodal Understanding and Robot Control with Vision-Language-Action ModelCode1
GEVRM: Goal-Expressive Video Generation Model For Robust Visual Manipulation0
DexVLA: Vision-Language Model with Plug-In Diffusion Expert for General Robot ControlCode1
ConRFT: A Reinforced Fine-tuning Method for VLA Models via Consistency PolicyCode3
HAMSTER: Hierarchical Action Models For Open-World Robot Manipulation0
Survey on Vision-Language-Action Models0
Show:102550
← PrevPage 2 of 4Next →

No leaderboard results yet.