SOTAVerified

Vision-Language-Action

Papers

Showing 150 of 157 papers

TitleStatusHype
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient RoboticsCode11
OpenVLA: An Open-Source Vision-Language-Action ModelCode9
Fine-Tuning Vision-Language-Action Models: Optimizing Speed and SuccessCode5
ShowUI: One Vision-Language-Action Model for GUI Visual AgentCode5
UniVLA: Learning to Act Anywhere with Task-centric Latent ActionsCode5
A Survey on Vision-Language-Action Models for Autonomous DrivingCode4
A Survey on Vision-Language-Action Models for Embodied AICode4
OpenDriveVLA: Towards End-to-end Autonomous Driving with Large Vision Language Action ModelCode4
WorldVLA: Towards Autoregressive Action World ModelCode4
PointVLA: Injecting the 3D World into Vision-Language-Action ModelsCode4
DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World KnowledgeCode3
VLA-RL: Towards Masterful and General Robotic Manipulation with Scalable Reinforcement LearningCode3
ConRFT: A Reinforced Fine-tuning Method for VLA Models via Consistency PolicyCode3
Real-Time Execution of Action Chunking Flow PoliciesCode3
GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI AgentsCode3
OpenHelix: A Short Survey, Empirical Analysis, and Open-Source Dual-System VLA Model for Robotic ManipulationCode3
LLaRA: Supercharging Robot Learning Data for Vision-Language PolicyCode3
Towards Generalist Robot Policies: What Matters in Building Vision-Language-Action ModelsCode3
Latent Action Pretraining from VideosCode3
Impromptu VLA: Open Weights and Open Data for Driving Vision-Language-Action ModelsCode3
AutoVLA: A Vision-Language-Action Model for End-to-End Autonomous Driving with Adaptive Reasoning and Reinforcement Fine-TuningCode3
A Comprehensive Survey on Continual Learning in Generative ModelsCode2
CombatVLA: An Efficient Vision-Language-Action Model for Combat Tasks in 3D Action Role-Playing GamesCode2
Vision Language Action Models in Robotic Manipulation: A Systematic ReviewCode2
Exploring the Limits of Vision-Language-Action Manipulations in Cross-task GeneralizationCode2
Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action PolicyCode2
Exploring the Adversarial Vulnerabilities of Vision-Language-Action Models in RoboticsCode2
RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic ControlCode2
TinyVLA: Towards Fast, Data-Efficient Vision-Language-Action Models for Robotic ManipulationCode2
BitVLA: 1-bit Vision-Language-Action Models for Robotics ManipulationCode2
An Embodied Generalist Agent in 3D WorldCode2
Parallels Between VLA Model Post-Training and Human Motor Learning: Progress, Challenges, and TrendsCode2
Diffusion Transformer PolicyCode2
UAV-VLA: Vision-Language-Action System for Large Scale Aerial Mission GenerationCode2
RoboMatrix: A Skill-centric Hierarchical Framework for Scalable Robot Task Planning and Execution in Open-WorldCode2
DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot ExecutionCode2
ChatVLA: Unified Multimodal Understanding and Robot Control with Vision-Language-Action ModelCode1
Bridging Language, Vision and Action: Multimodal VAEs in Robotic Manipulation TasksCode1
ChatVLA-2: Vision-Language-Action Model with Open-World Embodied Reasoning from Pretrained KnowledgeCode1
VOTE: Vision-Language-Action Optimization with Trajectory Ensemble VotingCode1
RoboFAC: A Comprehensive Framework for Robotic Failure Analysis and CorrectionCode1
Benchmarking Vision, Language, & Action Models on Robotic Learning TasksCode1
DexVLA: Vision-Language Model with Plug-In Diffusion Expert for General Robot ControlCode1
Benchmarking Vision, Language, & Action Models in Procedurally Generated, Open Ended Action EnvironmentsCode1
Adversarial Attacks on Robotic Vision Language Action ModelsCode1
From Seeing to Doing: Bridging Reasoning and Decision for Robotic ManipulationCode1
CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models0
Conditioning Matters: Training Diffusion Policies is Faster Than You Think0
Automated Data Curation Using GPS & NLP to Generate Instruction-Action Pairs for Autonomous Vehicle Vision-Language Navigation Datasets0
CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation0
Show:102550
← PrevPage 1 of 4Next →

No leaderboard results yet.