| RoboMonkey: Scaling Test-Time Sampling and Verification for Vision-Language-Action Models | Jun 21, 2025 | Synthetic Data GenerationVision-Language-Action | —Unverified | 0 |
| RLRC: Reinforcement Learning-based Recovery for Compressed Vision-Language-Action Models | Jun 21, 2025 | Model CompressionQuantization | —Unverified | 0 |
| VLA-OS: Structuring and Dissecting Planning Representations and Paradigms in Vision-Language-Action Models | Jun 21, 2025 | Action GenerationContinual Learning | —Unverified | 0 |
| CapsDT: Diffusion-Transformer for Capsule Robot Manipulation | Jun 19, 2025 | DiagnosticRobot Manipulation | —Unverified | 0 |
| ROSA: Harnessing Robot States for Vision-Language and Action Alignment | Jun 16, 2025 | State EstimationVision-Language-Action | —Unverified | 0 |
| Block-wise Adaptive Caching for Accelerating Diffusion Policy | Jun 16, 2025 | Action GenerationDenoising | —Unverified | 0 |
| LeVERB: Humanoid Whole-Body Control with Latent Vision-Language Instruction | Jun 16, 2025 | Instruction FollowingVision-Language-Action | —Unverified | 0 |
| From Intention to Execution: Probing the Generalization Boundaries of Vision-Language-Action Models | Jun 11, 2025 | Imitation LearningVision-Language-Action | —Unverified | 0 |
| SAFE: Multitask Failure Detection for Vision-Language-Action Models | Jun 11, 2025 | Conformal PredictionVision-Language-Action | —Unverified | 0 |
| EfficientVLA: Training-Free Acceleration and Compression for Vision-Language-Action Models | Jun 11, 2025 | Vision-Language-Action | —Unverified | 0 |
| FreqPolicy: Efficient Flow-based Visuomotor Policy via Frequency Consistency | Jun 10, 2025 | Action GenerationImage Generation | —Unverified | 0 |
| An Open-Source Software Toolkit & Benchmark Suite for the Evaluation and Adaptation of Multimodal Action Models | Jun 10, 2025 | Action GenerationImage Captioning | —Unverified | 0 |
| TGRPO :Fine-tuning Vision-Language-Action Model via Trajectory-wise Group Relative Policy Optimization | Jun 10, 2025 | reinforcement-learningReinforcement Learning | CodeCode Available | 0 |
| Hybrid Reasoning for Perception, Explanation, and Autonomous Action in Manufacturing | Jun 10, 2025 | Retrieval-augmented GenerationVision-Language-Action | —Unverified | 0 |
| BridgeVLA: Input-Output Alignment for Efficient 3D Manipulation Learning with Vision-Language Models | Jun 9, 2025 | Robot ManipulationVision-Language-Action | —Unverified | 0 |
| Surgeon Style Fingerprinting and Privacy Risk Quantification via Discrete Diffusion Models in a Vision-Language-Action Framework | Jun 9, 2025 | DenoisingVision-Language-Action | CodeCode Available | 0 |
| Robotic Policy Learning via Human-assisted Action Preference Optimization | Jun 8, 2025 | Vision-Language-Action | —Unverified | 0 |
| RoboCerebra: A Large-scale Benchmark for Long-horizon Robotic Manipulation Evaluation | Jun 7, 2025 | Vision-Language-Action | —Unverified | 0 |
| DriveAction: A Benchmark for Exploring Human-like Driving Decisions in VLA Models | Jun 6, 2025 | Autonomous DrivingAutonomous Vehicles | —Unverified | 0 |
| ReAgent-V: A Reward-Driven Multi-Agent Framework for Video Understanding | Jun 2, 2025 | Action RecognitionVideo Understanding | —Unverified | 0 |
| OG-VLA: 3D-Aware Vision Language Action Model via Orthographic Image Generation | Jun 1, 2025 | Image GenerationLarge Language Model | —Unverified | 0 |
| LoHoVLA: A Unified Vision-Language-Action Model for Long-Horizon Embodied Tasks | May 31, 2025 | Task PlanningVision-Language-Action | —Unverified | 0 |
| Towards a Generalizable Bimanual Foundation Policy via Flow-based Video Prediction | May 30, 2025 | Action GenerationOptical Flow Estimation | —Unverified | 0 |
| TrackVLA: Embodied Visual Tracking in the Wild | May 29, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Knowledge Insulating Vision-Language-Action Models: Train Fast, Run Fast, Generalize Better | May 29, 2025 | continuous-controlContinuous Control | —Unverified | 0 |