SOTAVerified

Vision-Language-Action

Papers

Showing 2650 of 157 papers

TitleStatusHype
Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action PolicyCode2
Exploring the Adversarial Vulnerabilities of Vision-Language-Action Models in RoboticsCode2
RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic ControlCode2
TinyVLA: Towards Fast, Data-Efficient Vision-Language-Action Models for Robotic ManipulationCode2
BitVLA: 1-bit Vision-Language-Action Models for Robotics ManipulationCode2
An Embodied Generalist Agent in 3D WorldCode2
Parallels Between VLA Model Post-Training and Human Motor Learning: Progress, Challenges, and TrendsCode2
Diffusion Transformer PolicyCode2
UAV-VLA: Vision-Language-Action System for Large Scale Aerial Mission GenerationCode2
RoboMatrix: A Skill-centric Hierarchical Framework for Scalable Robot Task Planning and Execution in Open-WorldCode2
DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot ExecutionCode2
ChatVLA: Unified Multimodal Understanding and Robot Control with Vision-Language-Action ModelCode1
Bridging Language, Vision and Action: Multimodal VAEs in Robotic Manipulation TasksCode1
ChatVLA-2: Vision-Language-Action Model with Open-World Embodied Reasoning from Pretrained KnowledgeCode1
VOTE: Vision-Language-Action Optimization with Trajectory Ensemble VotingCode1
RoboFAC: A Comprehensive Framework for Robotic Failure Analysis and CorrectionCode1
Benchmarking Vision, Language, & Action Models on Robotic Learning TasksCode1
DexVLA: Vision-Language Model with Plug-In Diffusion Expert for General Robot ControlCode1
Benchmarking Vision, Language, & Action Models in Procedurally Generated, Open Ended Action EnvironmentsCode1
Adversarial Attacks on Robotic Vision Language Action ModelsCode1
From Seeing to Doing: Bridging Reasoning and Decision for Robotic ManipulationCode1
CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models0
Conditioning Matters: Training Diffusion Policies is Faster Than You Think0
Automated Data Curation Using GPS & NLP to Generate Instruction-Action Pairs for Autonomous Vehicle Vision-Language Navigation Datasets0
CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation0
Show:102550
← PrevPage 2 of 7Next →

No leaderboard results yet.