| Hybrid Reasoning for Perception, Explanation, and Autonomous Action in Manufacturing | Jun 10, 2025 | Retrieval-augmented GenerationVision-Language-Action | —Unverified | 0 |
| Real-Time Execution of Action Chunking Flow Policies | Jun 9, 2025 | ChunkingVision-Language-Action | CodeCode Available | 3 |
| BridgeVLA: Input-Output Alignment for Efficient 3D Manipulation Learning with Vision-Language Models | Jun 9, 2025 | Robot ManipulationVision-Language-Action | —Unverified | 0 |
| BitVLA: 1-bit Vision-Language-Action Models for Robotics Manipulation | Jun 9, 2025 | QuantizationVision-Language-Action | CodeCode Available | 2 |
| Surgeon Style Fingerprinting and Privacy Risk Quantification via Discrete Diffusion Models in a Vision-Language-Action Framework | Jun 9, 2025 | DenoisingVision-Language-Action | CodeCode Available | 0 |
| Robotic Policy Learning via Human-assisted Action Preference Optimization | Jun 8, 2025 | Vision-Language-Action | —Unverified | 0 |
| RoboCerebra: A Large-scale Benchmark for Long-horizon Robotic Manipulation Evaluation | Jun 7, 2025 | Vision-Language-Action | —Unverified | 0 |
| DriveAction: A Benchmark for Exploring Human-like Driving Decisions in VLA Models | Jun 6, 2025 | Autonomous DrivingAutonomous Vehicles | —Unverified | 0 |
| Adversarial Attacks on Robotic Vision Language Action Models | Jun 3, 2025 | Vision-Language-Action | CodeCode Available | 1 |
| ReAgent-V: A Reward-Driven Multi-Agent Framework for Video Understanding | Jun 2, 2025 | Action RecognitionVideo Understanding | —Unverified | 0 |
| SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics | Jun 2, 2025 | Action GenerationGPU | CodeCode Available | 11 |
| OG-VLA: 3D-Aware Vision Language Action Model via Orthographic Image Generation | Jun 1, 2025 | Image GenerationLarge Language Model | —Unverified | 0 |
| LoHoVLA: A Unified Vision-Language-Action Model for Long-Horizon Embodied Tasks | May 31, 2025 | Task PlanningVision-Language-Action | —Unverified | 0 |
| Towards a Generalizable Bimanual Foundation Policy via Flow-based Video Prediction | May 30, 2025 | Action GenerationOptical Flow Estimation | —Unverified | 0 |
| Impromptu VLA: Open Weights and Open Data for Driving Vision-Language-Action Models | May 29, 2025 | Autonomous DrivingDiagnostic | CodeCode Available | 3 |
| TrackVLA: Embodied Visual Tracking in the Wild | May 29, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Knowledge Insulating Vision-Language-Action Models: Train Fast, Run Fast, Generalize Better | May 29, 2025 | continuous-controlContinuous Control | —Unverified | 0 |
| ForceVLA: Enhancing VLA Models with a Force-aware MoE for Contact-rich Manipulation | May 28, 2025 | Contact-rich ManipulationMixture-of-Experts | —Unverified | 0 |
| ChatVLA-2: Vision-Language-Action Model with Open-World Embodied Reasoning from Pretrained Knowledge | May 28, 2025 | Imitation LearningMath | CodeCode Available | 1 |
| Hume: Introducing System-2 Thinking in Visual-Language-Action Model | May 27, 2025 | DenoisingVision-Language-Action | —Unverified | 0 |
| Embodied AI with Foundation Models for Mobile Service Robots: A Systematic Review | May 26, 2025 | Decision Making Under UncertaintySensor Fusion | —Unverified | 0 |
| What Can RL Bring to VLA Generalization? An Empirical Study | May 26, 2025 | Reinforcement Learning (RL)Vision-Language-Action | —Unverified | 0 |
| VLA-RL: Towards Masterful and General Robotic Manipulation with Scalable Reinforcement Learning | May 24, 2025 | GPUReinforcement Learning (RL) | CodeCode Available | 3 |
| Interactive Post-Training for Vision-Language-Action Models | May 22, 2025 | Vision-Language-Action | —Unverified | 0 |
| DriveMoE: Mixture-of-Experts for Vision-Language-Action Model in End-to-End Autonomous Driving | May 22, 2025 | Autonomous DrivingBench2Drive | —Unverified | 0 |