SOTAVerified

Vision-Language-Action

Papers

Showing 51100 of 157 papers

TitleStatusHype
RLRC: Reinforcement Learning-based Recovery for Compressed Vision-Language-Action Models0
RoboMonkey: Scaling Test-Time Sampling and Verification for Vision-Language-Action Models0
VLA-OS: Structuring and Dissecting Planning Representations and Paradigms in Vision-Language-Action Models0
CapsDT: Diffusion-Transformer for Capsule Robot Manipulation0
ROSA: Harnessing Robot States for Vision-Language and Action Alignment0
LeVERB: Humanoid Whole-Body Control with Latent Vision-Language Instruction0
Block-wise Adaptive Caching for Accelerating Diffusion Policy0
EfficientVLA: Training-Free Acceleration and Compression for Vision-Language-Action Models0
From Intention to Execution: Probing the Generalization Boundaries of Vision-Language-Action Models0
SAFE: Multitask Failure Detection for Vision-Language-Action Models0
Hybrid Reasoning for Perception, Explanation, and Autonomous Action in Manufacturing0
An Open-Source Software Toolkit & Benchmark Suite for the Evaluation and Adaptation of Multimodal Action Models0
TGRPO :Fine-tuning Vision-Language-Action Model via Trajectory-wise Group Relative Policy OptimizationCode0
FreqPolicy: Efficient Flow-based Visuomotor Policy via Frequency Consistency0
BridgeVLA: Input-Output Alignment for Efficient 3D Manipulation Learning with Vision-Language Models0
Surgeon Style Fingerprinting and Privacy Risk Quantification via Discrete Diffusion Models in a Vision-Language-Action FrameworkCode0
Robotic Policy Learning via Human-assisted Action Preference Optimization0
RoboCerebra: A Large-scale Benchmark for Long-horizon Robotic Manipulation Evaluation0
DriveAction: A Benchmark for Exploring Human-like Driving Decisions in VLA Models0
ReAgent-V: A Reward-Driven Multi-Agent Framework for Video Understanding0
OG-VLA: 3D-Aware Vision Language Action Model via Orthographic Image Generation0
LoHoVLA: A Unified Vision-Language-Action Model for Long-Horizon Embodied Tasks0
Towards a Generalizable Bimanual Foundation Policy via Flow-based Video Prediction0
TrackVLA: Embodied Visual Tracking in the Wild0
Knowledge Insulating Vision-Language-Action Models: Train Fast, Run Fast, Generalize Better0
ForceVLA: Enhancing VLA Models with a Force-aware MoE for Contact-rich Manipulation0
Hume: Introducing System-2 Thinking in Visual-Language-Action Model0
Embodied AI with Foundation Models for Mobile Service Robots: A Systematic Review0
What Can RL Bring to VLA Generalization? An Empirical Study0
BadVLA: Towards Backdoor Attacks on Vision-Language-Action Models via Objective-Decoupled Optimization0
Interactive Post-Training for Vision-Language-Action Models0
DriveMoE: Mixture-of-Experts for Vision-Language-Action Model in End-to-End Autonomous Driving0
Perceptual Quality Assessment for Embodied AICode0
Object-Focus Actor for Data-efficient Robot Generalization Dexterous Manipulation0
EndoVLA: Dual-Phase Vision-Language-Action Model for Autonomous Tracking in Endoscopy0
FLARE: Robot Learning with Implicit World Modeling0
Conditioning Matters: Training Diffusion Policies is Faster Than You Think0
RT-cache: Efficient Robot Trajectory Retrieval System0
Pixel Motion as Universal Representation for Robot Control0
3D CAVLA: Leveraging Depth and 3D Context to Generalize Vision Language Action Models for Unseen Tasks0
Vision-Language-Action Models: Concepts, Progress, Applications and Challenges0
Automated Data Curation Using GPS & NLP to Generate Instruction-Action Pairs for Autonomous Vehicle Vision-Language Navigation Datasets0
NORA: A Small Open-Sourced Generalist Vision Language Action Model for Embodied Tasks0
π_0.5: a Vision-Language-Action Model with Open-World Generalization0
OPAL: Encoding Causal Understanding of Physical Systems for Robot Learning0
Grounding Multimodal LLMs to Embodied Agents that Ask for Help with Reinforcement Learning0
CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models0
MoLe-VLA: Dynamic Layer-skipping Vision Language Action Model via Mixture-of-Layers for Efficient Robot Manipulation0
DataPlatter: Boosting Robotic Manipulation Generalization with Minimal Costly Data0
GR00T N1: An Open Foundation Model for Generalist Humanoid Robots0
Show:102550
← PrevPage 2 of 4Next →

No leaderboard results yet.