| HiFi-Inpaint: Towards High-Fidelity Reference-Based Inpainting for Generating Detail-Preserving Human-Product Images | Mar 3, 2026 | | —Unverified | 2 |
| Proact-VL: A Proactive VideoLLM for Real-Time AI Companions | Mar 3, 2026 | | —Unverified | 2 |
| D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI | Mar 3, 2026 | | —Unverified | 2 |
| StockBench: Can LLM Agents Trade Stocks Profitably In Real-world Markets? | Mar 2, 2026 | | —Unverified | 2 |
| SciDER: Scientific Data-centric End-to-end Researcher | Mar 2, 2026 | | —Unverified | 2 |
| WorldStereo: Bridging Camera-Guided Video Generation and Scene Reconstruction via 3D Geometric Memories | Mar 2, 2026 | | —Unverified | 2 |
| Spotlight on Token Perception for Multimodal Reinforcement Learning | Mar 2, 2026 | | —Unverified | 2 |
| Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy | Mar 2, 2026 | | —Unverified | 2 |
| CRISP: Contact-Guided Real2Sim from Monocular Video with Planar Scene Primitives | Mar 1, 2026 | | —Unverified | 2 |
| VGGT-Det: Mining VGGT Internal Priors for Sensor-Geometry-Free Multi-View Indoor 3D Object Detection | Mar 1, 2026 | | —Unverified | 2 |
| SwiReasoning: Switch-Thinking in Latent and Explicit for Pareto-Superior Reasoning LLMs | Mar 1, 2026 | | —Unverified | 2 |
| SoFlow: Solution Flow Models for One-Step Generative Modeling | Mar 1, 2026 | | —Unverified | 2 |
| CLiFT: Compressive Light-Field Tokens for Compute-Efficient and Adaptive Neural Rendering | Feb 28, 2026 | | —Unverified | 2 |
| Easier Painting Than Thinking: Can Text-to-Image Models Set the Stage, but Not Direct the Play? | Feb 28, 2026 | | —Unverified | 2 |
| ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning | Feb 28, 2026 | | —Unverified | 2 |
| OmniGAIA: Towards Native Omni-Modal AI Agents | Feb 28, 2026 | | —Unverified | 2 |
| EditReward: A Human-Aligned Reward Model for Instruction-Guided Image Editing | Feb 28, 2026 | | —Unverified | 2 |
| Enhancing Spatial Understanding in Image Generation via Reward Modeling | Feb 27, 2026 | | —Unverified | 2 |
| MLP Memory: A Retriever-Pretrained Memory for Large Language Models | Feb 27, 2026 | | —Unverified | 2 |
| From Statics to Dynamics: Physics-Aware Image Editing with Latent Transition Priors | Feb 27, 2026 | | —Unverified | 2 |
| Unified Multimodal Models as Auto-Encoders | Feb 26, 2026 | | —Unverified | 2 |
| Solaris: Building a Multiplayer Video World Model in Minecraft | Feb 26, 2026 | | —Unverified | 2 |
| The Trinity of Consistency as a Defining Principle for General World Models | Feb 26, 2026 | | —Unverified | 2 |
| Deforming Videos to Masks: Flow Matching for Referring Video Segmentation | Feb 26, 2026 | | —Unverified | 2 |
| Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation | Feb 26, 2026 | | —Unverified | 2 |
| EmbodMocap: In-the-Wild 4D Human-Scene Reconstruction for Embodied Agents | Feb 26, 2026 | | —Unverified | 2 |
| G4Splat: Geometry-Guided Gaussian Splatting with Generative Prior | Feb 26, 2026 | | —Unverified | 2 |
| LiveMCPBench: Can Agents Navigate an Ocean of MCP Tools? | Feb 26, 2026 | | —Unverified | 2 |
| Training-free Mixed-Resolution Latent Upsampling for Spatially Accelerated Diffusion Transformers | Feb 25, 2026 | | —Unverified | 2 |
| RebuttalAgent: Strategic Persuasion in Academic Rebuttal via Theory of Mind | Feb 25, 2026 | | —Unverified | 2 |
| VecGlypher: Unified Vector Glyph Generation with Language Models | Feb 25, 2026 | | —Unverified | 2 |
| Seeing the Forest and the Trees: Query-Aware Tokenizer for Long-Video Multimodal Language Models | Feb 25, 2026 | | —Unverified | 2 |
| Should We Still Pretrain Encoders with Masked Language Modeling? | Feb 24, 2026 | | —Unverified | 2 |
| SimToolReal: An Object-Centric Policy for Zero-Shot Dexterous Tool Manipulation | Feb 24, 2026 | | —Unverified | 2 |
| Mobile-O: Unified Multimodal Understanding and Generation on Mobile Device | Feb 24, 2026 | | —Unverified | 2 |
| NewtonBench: Benchmarking Generalizable Scientific Law Discovery in LLM Agents | Feb 24, 2026 | | —Unverified | 2 |
| PyVision-RL: Forging Open Agentic Vision Models via RL | Feb 24, 2026 | | —Unverified | 2 |
| UniGenBench++: A Unified Semantic Evaluation Benchmark for Text-to-Image Generation | Feb 24, 2026 | | —Unverified | 2 |
| Mantis: A Versatile Vision-Language-Action Model with Disentangled Visual Foresight | Feb 23, 2026 | | —Unverified | 2 |
| OBS-Diff: Accurate Pruning For Diffusion Models in One-Shot | Feb 23, 2026 | | —Unverified | 2 |
| On Predictability of Reinforcement Learning Dynamics for Large Language Models | Feb 22, 2026 | | —Unverified | 2 |
| TOPReward: Token Probabilities as Hidden Zero-Shot Rewards for Robotics | Feb 22, 2026 | | —Unverified | 2 |
| Esoteric Language Models: Bridging Autoregressive and Masked Diffusion LLMs | Feb 21, 2026 | | —Unverified | 2 |
| Follow-Your-Shape: Shape-Aware Image Editing via Trajectory-Guided Region Control | Feb 21, 2026 | | —Unverified | 2 |
| Agent Banana: High-Fidelity Image Editing with Agentic Thinking and Tooling | Feb 21, 2026 | | —Unverified | 2 |
| SAGE: Scalable Agentic 3D Scene Generation for Embodied AI | Feb 20, 2026 | | —Unverified | 2 |
| VLANeXt: Recipes for Building Strong VLA Models | Feb 20, 2026 | | —Unverified | 2 |
| SimVLA: A Simple VLA Baseline for Robotic Manipulation | Feb 20, 2026 | | —Unverified | 2 |
| UniReason 1.0: A Unified Reasoning Framework for World Knowledge Aligned Image Generation and Editing | Feb 20, 2026 | | —Unverified | 2 |
| MolmoSpaces: A Large-Scale Open Ecosystem for Robot Navigation and Manipulation | Feb 19, 2026 | | —Unverified | 2 |