| DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge | Jul 6, 2025 | Image GenerationMultimodal Reasoning | CodeCode Available | 3 |
| Geometry-aware 4D Video Generation for Robot Manipulation | Jul 1, 2025 | Robot ManipulationVideo Generation | —Unverified | 0 |
| CapsDT: Diffusion-Transformer for Capsule Robot Manipulation | Jun 19, 2025 | DiagnosticRobot Manipulation | —Unverified | 0 |
| Robust Instant Policy: Leveraging Student's t-Regression Model for Robust In-context Imitation Learning of Robot Manipulation | Jun 18, 2025 | HallucinationImitation Learning | —Unverified | 0 |
| SENIOR: Efficient Query Selection and Preference-Guided Exploration in Preference-based Reinforcement Learning | Jun 17, 2025 | Density EstimationRobot Manipulation | —Unverified | 0 |
| What Matters in Learning from Large-Scale Datasets for Robot Manipulation | Jun 16, 2025 | DiversityImitation Learning | —Unverified | 0 |
| Demonstrating Multi-Suction Item Picking at Scale via Multi-Modal Learning of Pick Success | Jun 12, 2025 | Robot ManipulationSemantic Segmentation | —Unverified | 0 |
| BridgeVLA: Input-Output Alignment for Efficient 3D Manipulation Learning with Vision-Language Models | Jun 9, 2025 | Robot ManipulationVision-Language-Action | —Unverified | 0 |
| 3DFlowAction: Learning Cross-Embodiment Manipulation from 3D Flow World Model | Jun 6, 2025 | Optical Flow EstimationRobot Manipulation | CodeCode Available | 1 |
| OG-VLA: 3D-Aware Vision Language Action Model via Orthographic Image Generation | Jun 1, 2025 | Image GenerationLarge Language Model | —Unverified | 0 |
| Bi-Manual Joint Camera Calibration and Scene Representation | May 30, 2025 | Camera CalibrationRobot Manipulation | —Unverified | 0 |
| PartInstruct: Part-level Instruction Following for Fine-grained Robot Manipulation | May 27, 2025 | Instruction FollowingObject | —Unverified | 0 |
| WorldEval: World Model as Real-World Robot Policies Evaluator | May 25, 2025 | Robot ManipulationVideo Generation | —Unverified | 0 |
| Is Single-View Mesh Reconstruction Ready for Robotics? | May 23, 2025 | 3D ReconstructionBenchmarking | —Unverified | 0 |
| SEM: Enhancing Spatial Understanding for Robust Robot Manipulation | May 22, 2025 | 3D geometryRobot Manipulation | —Unverified | 0 |
| Object-Focus Actor for Data-efficient Robot Generalization Dexterous Manipulation | May 21, 2025 | ObjectPose Estimation | —Unverified | 0 |
| Robo2VLM: Visual Question Answering from Large-Scale In-the-Wild Robot Manipulation Datasets | May 21, 2025 | Dataset GenerationDescriptive | —Unverified | 0 |
| Vid2World: Crafting Video Diffusion Models to Interactive World Models | May 20, 2025 | Robot ManipulationSequential Decision Making | —Unverified | 0 |
| RLVR-World: Training World Models with Reinforcement Learning | May 20, 2025 | reinforcement-learningReinforcement Learning | CodeCode Available | 3 |
| Incentivizing Multimodal Reasoning in Large Models for Direct Robot Manipulation | May 19, 2025 | Multimodal ReasoningRobot Manipulation | —Unverified | 0 |
| Object-Centric Representations Improve Policy Generalization in Robot Manipulation | May 16, 2025 | Optical Character Recognition (OCR)Robot Manipulation | —Unverified | 0 |
| Zero-Shot Visual Generalization in Robot Manipulation | May 16, 2025 | Imitation LearningRepresentation Learning | —Unverified | 0 |
| Exploiting Radiance Fields for Grasp Generation on Novel Synthetic Views | May 16, 2025 | Grasp GenerationNovel View Synthesis | —Unverified | 0 |
| LODGE: Joint Hierarchical Task Planning and Learning of Domain Models with Grounded Execution | May 15, 2025 | Robot ManipulationTask Planning | —Unverified | 0 |
| EmbodiedMAE: A Unified 3D Multi-Modal Representation for Robot Manipulation | May 15, 2025 | Robot Manipulation | —Unverified | 0 |