SOTAVerified

Vision-Language-Action

Papers

Showing 150 of 157 papers

TitleStatusHype
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient RoboticsCode11
OpenVLA: An Open-Source Vision-Language-Action ModelCode9
UniVLA: Learning to Act Anywhere with Task-centric Latent ActionsCode5
ShowUI: One Vision-Language-Action Model for GUI Visual AgentCode5
Fine-Tuning Vision-Language-Action Models: Optimizing Speed and SuccessCode5
A Survey on Vision-Language-Action Models for Embodied AICode4
A Survey on Vision-Language-Action Models for Autonomous DrivingCode4
OpenDriveVLA: Towards End-to-end Autonomous Driving with Large Vision Language Action ModelCode4
WorldVLA: Towards Autoregressive Action World ModelCode4
PointVLA: Injecting the 3D World into Vision-Language-Action ModelsCode4
Towards Generalist Robot Policies: What Matters in Building Vision-Language-Action ModelsCode3
VLA-RL: Towards Masterful and General Robotic Manipulation with Scalable Reinforcement LearningCode3
GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI AgentsCode3
ConRFT: A Reinforced Fine-tuning Method for VLA Models via Consistency PolicyCode3
OpenHelix: A Short Survey, Empirical Analysis, and Open-Source Dual-System VLA Model for Robotic ManipulationCode3
AutoVLA: A Vision-Language-Action Model for End-to-End Autonomous Driving with Adaptive Reasoning and Reinforcement Fine-TuningCode3
LLaRA: Supercharging Robot Learning Data for Vision-Language PolicyCode3
Latent Action Pretraining from VideosCode3
DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World KnowledgeCode3
Impromptu VLA: Open Weights and Open Data for Driving Vision-Language-Action ModelsCode3
Real-Time Execution of Action Chunking Flow PoliciesCode3
A Comprehensive Survey on Continual Learning in Generative ModelsCode2
CombatVLA: An Efficient Vision-Language-Action Model for Combat Tasks in 3D Action Role-Playing GamesCode2
Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action PolicyCode2
Exploring the Limits of Vision-Language-Action Manipulations in Cross-task GeneralizationCode2
UAV-VLA: Vision-Language-Action System for Large Scale Aerial Mission GenerationCode2
Vision Language Action Models in Robotic Manipulation: A Systematic ReviewCode2
RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic ControlCode2
RoboMatrix: A Skill-centric Hierarchical Framework for Scalable Robot Task Planning and Execution in Open-WorldCode2
BitVLA: 1-bit Vision-Language-Action Models for Robotics ManipulationCode2
An Embodied Generalist Agent in 3D WorldCode2
Parallels Between VLA Model Post-Training and Human Motor Learning: Progress, Challenges, and TrendsCode2
Diffusion Transformer PolicyCode2
TinyVLA: Towards Fast, Data-Efficient Vision-Language-Action Models for Robotic ManipulationCode2
Exploring the Adversarial Vulnerabilities of Vision-Language-Action Models in RoboticsCode2
DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot ExecutionCode2
ChatVLA-2: Vision-Language-Action Model with Open-World Embodied Reasoning from Pretrained KnowledgeCode1
VOTE: Vision-Language-Action Optimization with Trajectory Ensemble VotingCode1
ChatVLA: Unified Multimodal Understanding and Robot Control with Vision-Language-Action ModelCode1
Bridging Language, Vision and Action: Multimodal VAEs in Robotic Manipulation TasksCode1
RoboFAC: A Comprehensive Framework for Robotic Failure Analysis and CorrectionCode1
Benchmarking Vision, Language, & Action Models on Robotic Learning TasksCode1
DexVLA: Vision-Language Model with Plug-In Diffusion Expert for General Robot ControlCode1
Benchmarking Vision, Language, & Action Models in Procedurally Generated, Open Ended Action EnvironmentsCode1
Adversarial Attacks on Robotic Vision Language Action ModelsCode1
From Seeing to Doing: Bridging Reasoning and Decision for Robotic ManipulationCode1
Vision-Language Meets the Skeleton: Progressively Distillation with Cross-Modal Knowledge for 3D Action Representation LearningCode0
Perceptual Quality Assessment for Embodied AICode0
Surgeon Style Fingerprinting and Privacy Risk Quantification via Discrete Diffusion Models in a Vision-Language-Action FrameworkCode0
TGRPO :Fine-tuning Vision-Language-Action Model via Trajectory-wise Group Relative Policy OptimizationCode0
Show:102550
← PrevPage 1 of 4Next →

No leaderboard results yet.