SOTAVerified

Vision-Language-Action

Papers

Showing 2650 of 157 papers

TitleStatusHype
Exploring the Limits of Vision-Language-Action Manipulations in Cross-task GeneralizationCode2
Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action PolicyCode2
CombatVLA: An Efficient Vision-Language-Action Model for Combat Tasks in 3D Action Role-Playing GamesCode2
UAV-VLA: Vision-Language-Action System for Large Scale Aerial Mission GenerationCode2
RoboMatrix: A Skill-centric Hierarchical Framework for Scalable Robot Task Planning and Execution in Open-WorldCode2
Exploring the Adversarial Vulnerabilities of Vision-Language-Action Models in RoboticsCode2
DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot ExecutionCode2
Diffusion Transformer PolicyCode2
TinyVLA: Towards Fast, Data-Efficient Vision-Language-Action Models for Robotic ManipulationCode2
An Embodied Generalist Agent in 3D WorldCode2
RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic ControlCode2
VOTE: Vision-Language-Action Optimization with Trajectory Ensemble VotingCode1
Adversarial Attacks on Robotic Vision Language Action ModelsCode1
ChatVLA-2: Vision-Language-Action Model with Open-World Embodied Reasoning from Pretrained KnowledgeCode1
RoboFAC: A Comprehensive Framework for Robotic Failure Analysis and CorrectionCode1
From Seeing to Doing: Bridging Reasoning and Decision for Robotic ManipulationCode1
Benchmarking Vision, Language, & Action Models in Procedurally Generated, Open Ended Action EnvironmentsCode1
ChatVLA: Unified Multimodal Understanding and Robot Control with Vision-Language-Action ModelCode1
DexVLA: Vision-Language Model with Plug-In Diffusion Expert for General Robot ControlCode1
Benchmarking Vision, Language, & Action Models on Robotic Learning TasksCode1
Bridging Language, Vision and Action: Multimodal VAEs in Robotic Manipulation TasksCode1
AnyPos: Automated Task-Agnostic Actions for Bimanual Manipulation0
LaViPlan : Language-Guided Visual Path Planning with RLVR0
Unified Vision-Language-Action Model0
CronusVLA: Transferring Latent Motion Across Time for Multi-Frame Prediction in Manipulation0
Show:102550
← PrevPage 2 of 7Next →

No leaderboard results yet.