| HY-WU (Part I): An Extensible Functional Neural Memory Framework and An Instantiation in Text-Guided Image Editing | Mar 7, 2026 | | —Unverified | 3 |
| LoGeR: Long-Context Geometric Reconstruction with Hybrid Memory | Mar 3, 2026 | | —Unverified | 3 |
| Human3R: Everyone Everywhere All at Once | Mar 3, 2026 | | —Unverified | 3 |
| Comparing AI Agents to Cybersecurity Professionals in Real-World Penetration Testing | Mar 3, 2026 | | —Unverified | 3 |
| tttLRM: Test-Time Training for Long Context and Autoregressive 3D Reconstruction | Mar 2, 2026 | | —Unverified | 3 |
| EasySteer: A Unified Framework for High-Performance and Extensible LLM Steering | Mar 2, 2026 | | —Unverified | 3 |
| FireRed-OCR Technical Report | Mar 2, 2026 | | —Unverified | 3 |
| Latent Diffusion Model without Variational Autoencoder | Mar 2, 2026 | | —Unverified | 3 |
| RLP: Reinforcement as a Pretraining Objective | Mar 1, 2026 | | —Unverified | 3 |
| Uni-cot: Towards Unified Chain-of-Thought Reasoning Across Text and Vision | Mar 1, 2026 | | —Unverified | 3 |
| GEM: A Gym for Agentic LLMs | Mar 1, 2026 | | —Unverified | 3 |
| OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence | Feb 26, 2026 | | —Unverified | 3 |
| The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution | Feb 26, 2026 | | —Unverified | 3 |
| Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding | Feb 26, 2026 | | —Unverified | 3 |
| EO-1: An Open Unified Embodied Foundation Model for General Robot Control | Feb 25, 2026 | | —Unverified | 3 |
| Toward Ultra-Long-Horizon Agentic Science: Cognitive Accumulation for Machine Learning Engineering | Feb 25, 2026 | | —Unverified | 3 |
| A Survey of Data Agents: Emerging Paradigm or Overstated Hype? | Feb 24, 2026 | | —Unverified | 3 |
| Much Ado About Noising: Dispelling the Myths of Generative Robotic Control | Feb 23, 2026 | | —Unverified | 3 |
| JavisDiT++: Unified Modeling and Optimization for Joint Audio-Video Generation | Feb 22, 2026 | | —Unverified | 3 |
| pi-Flow: Policy-Based Few-Step Generation via Imitation Distillation | Feb 19, 2026 | | —Unverified | 3 |
| PartUV: Part-Based UV Unwrapping of 3D Meshes | Feb 17, 2026 | | —Unverified | 3 |
| AnyUp: Universal Feature Upsampling | Feb 16, 2026 | | —Unverified | 3 |
| LLaDA2.1: Speeding Up Text Diffusion via Token Editing | Feb 13, 2026 | | —Unverified | 3 |
| DeepGen 1.0: A Lightweight Unified Multimodal Model for Advancing Image Generation and Editing | Feb 13, 2026 | | —Unverified | 3 |
| Xiaomi-Robotics-0: An Open-Sourced Vision-Language-Action Model with Real-Time Execution | Feb 13, 2026 | | —Unverified | 3 |
| LLM-in-Sandbox Elicits General Agentic Intelligence | Feb 12, 2026 | | —Unverified | 3 |
| DreamID-Omni: Unified Framework for Controllable Human-Centric Audio-Video Generation | Feb 12, 2026 | | —Unverified | 3 |
| SceneSmith: Agentic Generation of Simulation-Ready Indoor Scenes | Feb 9, 2026 | | —Unverified | 3 |
| Baichuan-M3: Modeling Clinical Inquiry for Reliable Medical Decision-Making | Feb 6, 2026 | | —Unverified | 3 |
| Yunjue Agent Tech Report: A Fully Reproducible, Zero-Start In-Situ Self-Evolving Agent System for Open-Ended Tasks | Feb 6, 2026 | | —Unverified | 3 |
| Simulating the Visual World with Artificial Intelligence: A Roadmap | Feb 5, 2026 | | —Unverified | 3 |
| Scaling Multiagent Systems with Process Rewards | Feb 4, 2026 | | —Unverified | 3 |
| SWE-Pruner: Self-Adaptive Context Pruning for Coding Agents | Feb 4, 2026 | | —Unverified | 3 |
| HY3D-Bench: Generation of 3D Assets | Feb 3, 2026 | | —Unverified | 3 |
| CL-bench: A Benchmark for Context Learning | Feb 3, 2026 | | —Unverified | 3 |
| MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents | Feb 2, 2026 | | —Unverified | 3 |
| Making Avatars Interact: Towards Text-Driven Human-Object Interaction for Controllable Talking Avatars | Feb 2, 2026 | | —Unverified | 3 |
| EditScore: Unlocking Online RL for Image Editing via High-Fidelity Reward Modeling | Feb 1, 2026 | | —Unverified | 3 |
| A Survey of Token Compression for Efficient Multimodal Large Language Models | Feb 1, 2026 | | —Unverified | 3 |
| LongCat-Flash-Thinking-2601 Technical Report | Feb 1, 2026 | | —Unverified | 3 |
| MetricAnything: Scaling Metric Depth Pretraining with Noisy Heterogeneous Sources | Jan 29, 2026 | | —Unverified | 3 |
| Deep Delta Learning | Jan 29, 2026 | | —Unverified | 3 |
| JUST-DUB-IT: Video Dubbing via Joint Audio-Visual Diffusion | Jan 29, 2026 | | —Unverified | 3 |
| DyPE: Dynamic Position Extrapolation for Ultra High Resolution Diffusion | Jan 29, 2026 | | —Unverified | 3 |
| TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial Flows | Jan 28, 2026 | | —Unverified | 3 |
| Self-Distillation Enables Continual Learning | Jan 27, 2026 | | —Unverified | 3 |
| Geometry-Grounded Gaussian Splatting | Jan 27, 2026 | | —Unverified | 3 |
| VoXtream: Full-Stream Text-to-Speech with Extremely Low Latency | Jan 26, 2026 | | —Unverified | 3 |
| AgentDoG: A Diagnostic Guardrail Framework for AI Agent Safety and Security | Jan 26, 2026 | | —Unverified | 3 |
| EvoCUA: Evolving Computer Use Agents via Learning from Scalable Synthetic Experience | Jan 23, 2026 | | —Unverified | 3 |