SOTAVerified

Vision-Language-Action

Papers

Showing 5175 of 157 papers

TitleStatusHype
RoboMonkey: Scaling Test-Time Sampling and Verification for Vision-Language-Action Models0
RLRC: Reinforcement Learning-based Recovery for Compressed Vision-Language-Action Models0
VLA-OS: Structuring and Dissecting Planning Representations and Paradigms in Vision-Language-Action Models0
CapsDT: Diffusion-Transformer for Capsule Robot Manipulation0
ROSA: Harnessing Robot States for Vision-Language and Action Alignment0
Block-wise Adaptive Caching for Accelerating Diffusion Policy0
LeVERB: Humanoid Whole-Body Control with Latent Vision-Language Instruction0
From Intention to Execution: Probing the Generalization Boundaries of Vision-Language-Action Models0
SAFE: Multitask Failure Detection for Vision-Language-Action Models0
EfficientVLA: Training-Free Acceleration and Compression for Vision-Language-Action Models0
FreqPolicy: Efficient Flow-based Visuomotor Policy via Frequency Consistency0
An Open-Source Software Toolkit & Benchmark Suite for the Evaluation and Adaptation of Multimodal Action Models0
TGRPO :Fine-tuning Vision-Language-Action Model via Trajectory-wise Group Relative Policy OptimizationCode0
Hybrid Reasoning for Perception, Explanation, and Autonomous Action in Manufacturing0
BridgeVLA: Input-Output Alignment for Efficient 3D Manipulation Learning with Vision-Language Models0
Surgeon Style Fingerprinting and Privacy Risk Quantification via Discrete Diffusion Models in a Vision-Language-Action FrameworkCode0
Robotic Policy Learning via Human-assisted Action Preference Optimization0
RoboCerebra: A Large-scale Benchmark for Long-horizon Robotic Manipulation Evaluation0
DriveAction: A Benchmark for Exploring Human-like Driving Decisions in VLA Models0
ReAgent-V: A Reward-Driven Multi-Agent Framework for Video Understanding0
OG-VLA: 3D-Aware Vision Language Action Model via Orthographic Image Generation0
LoHoVLA: A Unified Vision-Language-Action Model for Long-Horizon Embodied Tasks0
Towards a Generalizable Bimanual Foundation Policy via Flow-based Video Prediction0
TrackVLA: Embodied Visual Tracking in the Wild0
Knowledge Insulating Vision-Language-Action Models: Train Fast, Run Fast, Generalize Better0
Show:102550
← PrevPage 3 of 7Next →

No leaderboard results yet.