| MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents | Feb 2, 2026 | | —Unverified | 3 |
| Holi-Spatial: Evolving Video Streams into Holistic 3D Spatial Intelligence | Mar 8, 2026 | | —Unverified | 3 |
| OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence | Feb 26, 2026 | | —Unverified | 3 |
| Uni-cot: Towards Unified Chain-of-Thought Reasoning Across Text and Vision | Mar 1, 2026 | | —Unverified | 3 |
| Baichuan-M3: Modeling Clinical Inquiry for Reliable Medical Decision-Making | Feb 6, 2026 | | —Unverified | 3 |
| LongCat-Flash-Thinking-2601 Technical Report | Feb 1, 2026 | | —Unverified | 3 |
| Tree Search for LLM Agent Reinforcement Learning | Mar 18, 2026 | | —Unverified | 3 |
| Light of Normals: Unified Feature Representation for Universal Photometric Stereo | Mar 9, 2026 | | —Unverified | 3 |
| SWE-Pruner: Self-Adaptive Context Pruning for Coding Agents | Feb 4, 2026 | | —Unverified | 3 |
| Comparing AI Agents to Cybersecurity Professionals in Real-World Penetration Testing | Mar 3, 2026 | | —Unverified | 3 |
| Qianfan-OCR: A Unified End-to-End Model for Document Intelligence | Mar 11, 2026 | | —Unverified | 3 |
| Grounding World Simulation Models in a Real-World Metropolis | Mar 16, 2026 | | —Unverified | 3 |
| UP2You: Fast Reconstruction of Yourself from Unconstrained Photo Collections | Mar 16, 2026 | | —Unverified | 3 |
| EO-1: An Open Unified Embodied Foundation Model for General Robot Control | Feb 25, 2026 | | —Unverified | 3 |
| MetricAnything: Scaling Metric Depth Pretraining with Noisy Heterogeneous Sources | Jan 29, 2026 | | —Unverified | 3 |
| JavisDiT++: Unified Modeling and Optimization for Joint Audio-Video Generation | Feb 22, 2026 | | —Unverified | 3 |
| pi-Flow: Policy-Based Few-Step Generation via Imitation Distillation | Feb 19, 2026 | | —Unverified | 3 |
| LaTeXTrans: Structured LaTeX Translation with Multi-Agent Coordination | Mar 11, 2026 | | —Unverified | 3 |
| LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels | Mar 13, 2026 | | —Unverified | 3 |
| Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders | Jan 22, 2026 | | —Unverified | 3 |
| Human3R: Everyone Everywhere All at Once | Mar 3, 2026 | | —Unverified | 3 |
| Geometry-Grounded Gaussian Splatting | Jan 27, 2026 | | —Unverified | 3 |
| ReactMotion: Generating Reactive Listener Motions from Speaker Utterance | Mar 16, 2026 | | —Unverified | 3 |
| Xiaomi-Robotics-0: An Open-Sourced Vision-Language-Action Model with Real-Time Execution | Feb 13, 2026 | | —Unverified | 3 |
| InternVL-U: Democratizing Unified Multimodal Models for Understanding, Reasoning, Generation and Editing | Mar 10, 2026 | | —Unverified | 3 |
| Scaling Multiagent Systems with Process Rewards | Feb 4, 2026 | | —Unverified | 3 |
| EasySteer: A Unified Framework for High-Performance and Extensible LLM Steering | Mar 2, 2026 | | —Unverified | 3 |
| RLP: Reinforcement as a Pretraining Objective | Mar 1, 2026 | | —Unverified | 3 |
| VoXtream: Full-Stream Text-to-Speech with Extremely Low Latency | Jan 26, 2026 | | —Unverified | 3 |
| Latent Diffusion Model without Variational Autoencoder | Mar 2, 2026 | | —Unverified | 3 |
| OpenSeeker: Democratizing Frontier Search Agents by Fully Open-Sourcing Training Data | Mar 16, 2026 | | —Unverified | 3 |
| Neural Thickets: Diverse Task Experts Are Dense Around Pretrained Weights | Mar 12, 2026 | | —Unverified | 3 |
| DVD: Deterministic Video Depth Estimation with Generative Priors | Mar 12, 2026 | | —Unverified | 3 |
| SoulX-Singer: Towards High-Quality Zero-Shot Singing Voice Synthesis | Mar 16, 2026 | | —Unverified | 3 |
| Making Avatars Interact: Towards Text-Driven Human-Object Interaction for Controllable Talking Avatars | Feb 2, 2026 | | —Unverified | 3 |
| AgentDoG: A Diagnostic Guardrail Framework for AI Agent Safety and Security | Jan 26, 2026 | | —Unverified | 3 |
| SparkVSR: Interactive Video Super-Resolution via Sparse Keyframe Propagation | Mar 17, 2026 | | —Unverified | 3 |
| TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial Flows | Jan 28, 2026 | | —Unverified | 3 |
| DyPE: Dynamic Position Extrapolation for Ultra High Resolution Diffusion | Jan 29, 2026 | | —Unverified | 3 |
| FireRed-OCR Technical Report | Mar 2, 2026 | | —Unverified | 3 |
| AnyUp: Universal Feature Upsampling | Feb 16, 2026 | | —Unverified | 3 |
| PartUV: Part-Based UV Unwrapping of 3D Meshes | Feb 17, 2026 | | —Unverified | 3 |
| EditScore: Unlocking Online RL for Image Editing via High-Fidelity Reward Modeling | Feb 1, 2026 | | —Unverified | 3 |
| Much Ado About Noising: Dispelling the Myths of Generative Robotic Control | Feb 23, 2026 | | —Unverified | 3 |
| Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding | Feb 26, 2026 | | —Unverified | 3 |
| χ_0: Resource-Aware Robust Manipulation via Taming Distributional Inconsistencies | Mar 17, 2026 | | —Unverified | 3 |
| InstantSfM: Towards GPU-Native SfM for the Deep Learning Era | Mar 11, 2026 | | —Unverified | 3 |
| Simulating the Visual World with Artificial Intelligence: A Roadmap | Feb 5, 2026 | | —Unverified | 3 |
| EvoCUA: Evolving Computer Use Agents via Learning from Scalable Synthetic Experience | Jan 23, 2026 | | —Unverified | 3 |
| The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution | Feb 26, 2026 | | —Unverified | 3 |