| Vlaser: Vision-Language-Action Model with Synergistic Embodied Reasoning | Jan 27, 2026 | | —Unverified | 1 |
| Vector Quantization using Gaussian Variational Autoencoder | Feb 5, 2026 | | —Unverified | 1 |
| T-REGS: Minimum Spanning Tree Regularization for Self-Supervised Learning | Feb 6, 2026 | | —Unverified | 1 |
| daVinci-Agency: Unlocking Long-Horizon Agency Data-Efficiently | Feb 4, 2026 | | —Unverified | 1 |
| SWE-Exp: Experience-Driven Software Issue Resolution | Feb 2, 2026 | | —Unverified | 1 |
| LIVE: Long-horizon Interactive Video World Modeling | Feb 3, 2026 | | —Unverified | 1 |
| ConceptMoE: Adaptive Token-to-Concept Compression for Implicit Compute Allocation | Jan 29, 2026 | | —Unverified | 1 |
| OpenAutoNLU: Open Source AutoML Library for NLU | Mar 2, 2026 | | —Unverified | 1 |
| Contact-Anchored Policies: Contact Conditioning Creates Strong Robot Utility Models | Feb 9, 2026 | | —Unverified | 1 |
| m1: Unleash the Potential of Test-Time Scaling for Medical Reasoning with Large Language Models | Feb 18, 2026 | | —Unverified | 1 |
| Evaluating and Steering Modality Preferences in Multimodal Large Language Model | Feb 4, 2026 | | —Unverified | 1 |
| How Well Do Models Follow Visual Instructions? VIBE: A Systematic Benchmark for Visual Instruction-Driven Image Editing | Feb 2, 2026 | | —Unverified | 1 |
| V-Bridge: Bridging Video Generative Priors to Versatile Few-shot Image Restoration | Mar 13, 2026 | | —Unverified | 1 |
| Matryoshka Gaussian Splatting | Mar 19, 2026 | | —Unverified | 1 |
| LangMap: A Hierarchical Benchmark for Open-Vocabulary Goal Navigation | Feb 2, 2026 | | —Unverified | 1 |
| Mano: Restriking Manifold Optimization for LLM Training | Jan 30, 2026 | | —Unverified | 1 |
| CreativeBench: Benchmarking and Enhancing Machine Creativity via Self-Evolving Challenges | Mar 12, 2026 | | —Unverified | 1 |
| When Good Sounds Go Adversarial: Jailbreaking Audio-Language Models with Benign Inputs | Feb 4, 2026 | | —Unverified | 1 |
| Multi-Crit: Benchmarking Multimodal Judges on Pluralistic Criteria-Following | Mar 12, 2026 | | —Unverified | 1 |
| CAR-bench: Evaluating the Consistency and Limit-Awareness of LLM Agents under Real-World Uncertainty | Jan 29, 2026 | | —Unverified | 1 |
| Scaling Behavior of Discrete Diffusion Language Models | Feb 15, 2026 | | —Unverified | 1 |
| MARS: Modular Agent with Reflective Search for Automated AI Research | Feb 17, 2026 | | —Unverified | 1 |
| RISE-Video: Can Video Generators Decode Implicit World Rules? | Feb 5, 2026 | | —Unverified | 1 |
| Thinking with Comics: Enhancing Multimodal Reasoning through Structured Visual Storytelling | Feb 3, 2026 | | —Unverified | 1 |
| Which Heads Matter for Reasoning? RL-Guided KV Cache Compression | Jan 30, 2026 | | —Unverified | 1 |
| PaperArena: An Evaluation Benchmark for Tool-Augmented Agentic Reasoning on Scientific Literature | Jan 30, 2026 | | —Unverified | 1 |
| ObjEmbed: Towards Universal Multimodal Object Embeddings | Feb 3, 2026 | | —Unverified | 1 |
| ProactiveBench: Benchmarking Proactiveness in Multimodal Large Language Models | Mar 19, 2026 | | —Unverified | 1 |
| DeepResearch Bench II: Diagnosing Deep Research Agents via Rubrics from Expert Report | Jan 30, 2026 | | —Unverified | 1 |
| TIDE: Trajectory-based Diagnostic Evaluation of Test-Time Improvement in LLM Agents | Feb 3, 2026 | | —Unverified | 1 |
| DanQing: An Up-to-Date Large-Scale Chinese Vision-Language Pre-training Dataset | Jan 30, 2026 | | —Unverified | 1 |
| Show, Don't Tell: Morphing Latent Reasoning into Image Generation | Feb 2, 2026 | | —Unverified | 1 |
| Least-Loaded Expert Parallelism: Load Balancing An Imbalanced Mixture-of-Experts | Jan 23, 2026 | | —Unverified | 1 |
| LVOmniBench: Pioneering Long Audio-Video Understanding Evaluation for Omnimodal LLMs | Mar 19, 2026 | | —Unverified | 1 |
| AgilePruner: An Empirical Study of Attention and Diversity for Adaptive Visual Token Pruning in Large Vision-Language Models | Mar 1, 2026 | | —Unverified | 1 |
| FinToolBench: Evaluating LLM Agents for Real-World Financial Tool Use | Mar 9, 2026 | | —Unverified | 1 |
| Learning Self-Correction in Vision-Language Models via Rollout Augmentation | Feb 9, 2026 | | —Unverified | 1 |
| How Vulnerable Are AI Agents to Indirect Prompt Injections? Insights from a Large-Scale Public Competition | Mar 16, 2026 | | —Unverified | 1 |
| Image Generation with a Sphere Encoder | Feb 16, 2026 | | —Unverified | 1 |
| Can Vision-Language Models Solve the Shell Game? | Mar 9, 2026 | | —Unverified | 1 |
| Spider-Sense: Intrinsic Risk Sensing for Efficient Agent Defense with Hierarchical Adaptive Screening | Feb 6, 2026 | | —Unverified | 1 |
| LLM Probability Concentration: How Alignment Shrinks the Generative Horizon | Mar 2, 2026 | | —Unverified | 1 |
| Deep Ignorance: Filtering Pretraining Data Builds Tamper-Resistant Safeguards into Open-Weight LLMs | Feb 17, 2026 | | —Unverified | 1 |
| Depth-Breadth Synergy in RLVR: Unlocking LLM Reasoning Gains with Adaptive Exploration | Feb 25, 2026 | | —Unverified | 1 |
| WildOS: Open-Vocabulary Object Search in the Wild | Feb 22, 2026 | | —Unverified | 1 |
| Chain of World: World Model Thinking in Latent Motion | Mar 3, 2026 | | —Unverified | 1 |
| ContextBench: A Benchmark for Context Retrieval in Coding Agents | Feb 11, 2026 | | —Unverified | 1 |
| Parallel-Probe: Towards Efficient Parallel Thinking via 2D Probing | Feb 10, 2026 | | —Unverified | 1 |
| Mamba-FCS: Joint Spatio- Frequency Feature Fusion, Change-Guided Attention, and SeK Loss for Enhanced Semantic Change Detection in Remote Sensing | Feb 6, 2026 | | —Unverified | 1 |
| LatentChem: From Textual CoT to Latent Thinking in Chemical Reasoning | Mar 13, 2026 | | —Unverified | 1 |