| DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge | Jul 6, 2025 | Image GenerationMultimodal Reasoning | CodeCode Available | 3 |
| Geometry-aware 4D Video Generation for Robot Manipulation | Jul 1, 2025 | Robot ManipulationVideo Generation | —Unverified | 0 |
| CapsDT: Diffusion-Transformer for Capsule Robot Manipulation | Jun 19, 2025 | DiagnosticRobot Manipulation | —Unverified | 0 |
| Robust Instant Policy: Leveraging Student's t-Regression Model for Robust In-context Imitation Learning of Robot Manipulation | Jun 18, 2025 | HallucinationImitation Learning | —Unverified | 0 |
| SENIOR: Efficient Query Selection and Preference-Guided Exploration in Preference-based Reinforcement Learning | Jun 17, 2025 | Density EstimationRobot Manipulation | —Unverified | 0 |
| What Matters in Learning from Large-Scale Datasets for Robot Manipulation | Jun 16, 2025 | DiversityImitation Learning | —Unverified | 0 |
| Demonstrating Multi-Suction Item Picking at Scale via Multi-Modal Learning of Pick Success | Jun 12, 2025 | Robot ManipulationSemantic Segmentation | —Unverified | 0 |
| BridgeVLA: Input-Output Alignment for Efficient 3D Manipulation Learning with Vision-Language Models | Jun 9, 2025 | Robot ManipulationVision-Language-Action | —Unverified | 0 |
| 3DFlowAction: Learning Cross-Embodiment Manipulation from 3D Flow World Model | Jun 6, 2025 | Optical Flow EstimationRobot Manipulation | CodeCode Available | 1 |
| OG-VLA: 3D-Aware Vision Language Action Model via Orthographic Image Generation | Jun 1, 2025 | Image GenerationLarge Language Model | —Unverified | 0 |
| Bi-Manual Joint Camera Calibration and Scene Representation | May 30, 2025 | Camera CalibrationRobot Manipulation | —Unverified | 0 |
| PartInstruct: Part-level Instruction Following for Fine-grained Robot Manipulation | May 27, 2025 | Instruction FollowingObject | —Unverified | 0 |
| WorldEval: World Model as Real-World Robot Policies Evaluator | May 25, 2025 | Robot ManipulationVideo Generation | —Unverified | 0 |
| Is Single-View Mesh Reconstruction Ready for Robotics? | May 23, 2025 | 3D ReconstructionBenchmarking | —Unverified | 0 |
| SEM: Enhancing Spatial Understanding for Robust Robot Manipulation | May 22, 2025 | 3D geometryRobot Manipulation | —Unverified | 0 |
| Object-Focus Actor for Data-efficient Robot Generalization Dexterous Manipulation | May 21, 2025 | ObjectPose Estimation | —Unverified | 0 |
| Robo2VLM: Visual Question Answering from Large-Scale In-the-Wild Robot Manipulation Datasets | May 21, 2025 | Dataset GenerationDescriptive | —Unverified | 0 |
| Vid2World: Crafting Video Diffusion Models to Interactive World Models | May 20, 2025 | Robot ManipulationSequential Decision Making | —Unverified | 0 |
| RLVR-World: Training World Models with Reinforcement Learning | May 20, 2025 | reinforcement-learningReinforcement Learning | CodeCode Available | 3 |
| Incentivizing Multimodal Reasoning in Large Models for Direct Robot Manipulation | May 19, 2025 | Multimodal ReasoningRobot Manipulation | —Unverified | 0 |
| Object-Centric Representations Improve Policy Generalization in Robot Manipulation | May 16, 2025 | Optical Character Recognition (OCR)Robot Manipulation | —Unverified | 0 |
| Zero-Shot Visual Generalization in Robot Manipulation | May 16, 2025 | Imitation LearningRepresentation Learning | —Unverified | 0 |
| Exploiting Radiance Fields for Grasp Generation on Novel Synthetic Views | May 16, 2025 | Grasp GenerationNovel View Synthesis | —Unverified | 0 |
| LODGE: Joint Hierarchical Task Planning and Learning of Domain Models with Grounded Execution | May 15, 2025 | Robot ManipulationTask Planning | —Unverified | 0 |
| FlowDreamer: A RGB-D World Model with Flow-based Motion Representations for Robot Manipulation | May 15, 2025 | Robot ManipulationSemantic Similarity | —Unverified | 0 |
| NVSPolicy: Adaptive Novel-View Synthesis for Generalizable Language-Conditioned Policy Learning | May 15, 2025 | Novel View SynthesisRobot Manipulation | —Unverified | 0 |
| EmbodiedMAE: A Unified 3D Multi-Modal Representation for Robot Manipulation | May 15, 2025 | Robot Manipulation | —Unverified | 0 |
| IN-RIL: Interleaved Reinforcement and Imitation Learning for Policy Fine-Tuning | May 15, 2025 | Efficient ExplorationImitation Learning | CodeCode Available | 0 |
| ManipBench: Benchmarking Vision-Language Models for Low-Level Robot Manipulation | May 14, 2025 | BenchmarkingDeformable Object Manipulation | —Unverified | 0 |
| Mini Diffuser: Fast Multi-task Diffusion Policy Training Using Two-level Mini-batches | May 14, 2025 | Action GenerationImage Generation | CodeCode Available | 1 |
| From Seeing to Doing: Bridging Reasoning and Decision for Robotic Manipulation | May 13, 2025 | Robot ManipulationSpatial Reasoning | CodeCode Available | 1 |
| X-Sim: Cross-Embodiment Learning via Real-to-Sim-to-Real | May 11, 2025 | Domain AdaptationImitation Learning | —Unverified | 0 |
| UniVLA: Learning to Act Anywhere with Task-centric Latent Actions | May 9, 2025 | Robot ManipulationVision-Language-Action | CodeCode Available | 5 |
| Efficient Sensorimotor Learning for Open-world Robot Manipulation | May 7, 2025 | Robot Manipulation | —Unverified | 0 |
| OpenHelix: A Short Survey, Empirical Analysis, and Open-Source Dual-System VLA Model for Robotic Manipulation | May 6, 2025 | Robot ManipulationVision-Language-Action | CodeCode Available | 3 |
| The Unreasonable Effectiveness of Discrete-Time Gaussian Process Mixtures for Robot Policy Learning | May 6, 2025 | CPUGaussian Processes | —Unverified | 0 |
| Sim2Real Transfer for Vision-Based Grasp Verification | May 5, 2025 | Objectobject-detection | CodeCode Available | 0 |
| RoBridge: A Hierarchical Architecture Bridging Cognition and Execution for General Robotic Manipulation | May 3, 2025 | Robot Manipulation | —Unverified | 0 |
| SPECI: Skill Prompts based Hierarchical Continual Imitation Learning for Robot Manipulation | Apr 22, 2025 | Action GenerationImitation Learning | —Unverified | 0 |
| Trajectory Adaptation using Large Language Models | Apr 17, 2025 | Robot Manipulation | —Unverified | 0 |
| Two by Two: Learning Multi-Task Pairwise Objects Assembly for Generalizable Robot Manipulation | Apr 9, 2025 | 3D AssemblyPose Estimation | —Unverified | 0 |
| Multimodal Fusion and Vision-Language Models: A Survey for Robot Vision | Apr 3, 2025 | 3D Object Detectioncross-modal alignment | CodeCode Available | 1 |
| Think Small, Act Big: Primitive Prompt Learning for Lifelong Robot Manipulation | Apr 1, 2025 | Prompt LearningRobot Manipulation | —Unverified | 0 |
| HACTS: a Human-As-Copilot Teleoperation System for Robot Learning | Mar 31, 2025 | Autonomous VehiclesImitation Learning | —Unverified | 0 |
| AutoEval: Autonomous Evaluation of Generalist Robot Manipulation Policies in the Real World | Mar 31, 2025 | Robot ManipulationScheduling | CodeCode Available | 2 |
| REMAC: Self-Reflective and Self-Evolving Multi-Agent Collaboration for Long-Horizon Robot Manipulation | Mar 28, 2025 | Robot ManipulationTask Planning | —Unverified | 0 |
| MoLe-VLA: Dynamic Layer-skipping Vision Language Action Model via Mixture-of-Layers for Efficient Robot Manipulation | Mar 26, 2025 | Knowledge DistillationMixture-of-Experts | —Unverified | 0 |
| DataPlatter: Boosting Robotic Manipulation Generalization with Minimal Costly Data | Mar 25, 2025 | Robot ManipulationSpatial Reasoning | —Unverified | 0 |
| Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy | Mar 25, 2025 | DenoisingRobot Manipulation | CodeCode Available | 2 |
| PhysTwin: Physics-Informed Reconstruction and Simulation of Deformable Objects from Videos | Mar 23, 2025 | 4D reconstructionDeformable Object Manipulation | CodeCode Available | 3 |