| Steering Evaluation-Aware Language Models to Act Like They Are Deployed | Mar 2, 2026 | | —Unverified | 1 |
| ReForm: Reflective Autoformalization with Prospective Bounded Sequence Optimization | Feb 10, 2026 | | —Unverified | 1 |
| GISA: A Benchmark for General Information-Seeking Assistant | Feb 13, 2026 | | —Unverified | 1 |
| Uni-MMMU: A Massive Multi-discipline Multimodal Unified Benchmark | Feb 25, 2026 | | —Unverified | 1 |
| AgentVista: Evaluating Multimodal Agents in Ultra-Challenging Realistic Visual Scenarios | Mar 2, 2026 | | —Unverified | 1 |
| Video2Layout: Recall and Reconstruct Metric-Grounded Cognitive Map for Spatial Reasoning | Mar 7, 2026 | | —Unverified | 1 |
| V-Retrver: Evidence-Driven Agentic Reasoning for Universal Multimodal Retrieval | Feb 25, 2026 | | —Unverified | 1 |
| MediX-R1: Open Ended Medical Reinforcement Learning | Feb 26, 2026 | | —Unverified | 1 |
| MiniAppBench: Evaluating the Shift from Text to Interactive HTML Responses in LLM-Powered Assistants | Mar 16, 2026 | | —Unverified | 1 |
| TodoEvolve: Learning to Architect Agent Planning Systems | Feb 8, 2026 | | —Unverified | 1 |
| GUI-Libra: Training Native GUI Agents to Reason and Act with Action-aware Supervision and Partially Verifiable RL | Feb 25, 2026 | | —Unverified | 1 |
| Modular Neural Image Signal Processing | Mar 9, 2026 | | —Unverified | 1 |
| OVGGT: O(1) Constant-Cost Streaming Visual Geometry Transformer | Mar 9, 2026 | | —Unverified | 1 |
| TeamHOI: Learning a Unified Policy for Cooperative Human-Object Interactions with Any Team Size | Mar 9, 2026 | | —Unverified | 1 |
| HorizonMath: Measuring AI Progress Toward Mathematical Discovery with Automatic Verification | Mar 16, 2026 | | —Unverified | 1 |
| AIA: Rethinking Architecture Decoupling Strategy In Unified Multimodal Model | Mar 17, 2026 | | —Unverified | 1 |
| COREA: Coupled Relightable 3D Gaussians and SDFs for Efficient Normal Alignment | Mar 17, 2026 | | —Unverified | 1 |
| Sharing State Between Prompts and Programs | Mar 16, 2026 | | —Unverified | 1 |
| MIST: Mutual Information Estimation Via Supervised Training | Feb 23, 2026 | | —Unverified | 1 |
| Learning Personalized Agents from Human Feedback | Feb 18, 2026 | | —Unverified | 1 |
| SciAgentGym: Benchmarking Multi-Step Scientific Tool-use in LLM Agents | Feb 13, 2026 | | —Unverified | 1 |
| Prism: Spectral-Aware Block-Sparse Attention | Feb 9, 2026 | | —Unverified | 1 |
| General Agent Evaluation | Feb 26, 2026 | | —Unverified | 1 |
| SAMA: Factorized Semantic Anchoring and Motion Alignment for Instruction-Guided Video Editing | Mar 19, 2026 | | —Unverified | 1 |
| AlphaApollo: A System for Deep Agentic Reasoning | Mar 10, 2026 | | —Unverified | 1 |
| ORCA: Orchestrated Reasoning with Collaborative Agents for Document Visual Question Answering | Mar 2, 2026 | | —Unverified | 1 |
| Kinema4D: Kinematic 4D World Modeling for Spatiotemporal Embodied Simulation | Mar 17, 2026 | | —Unverified | 1 |
| SeeThrough3D: Occlusion Aware 3D Control in Text-to-Image Generation | Feb 26, 2026 | | —Unverified | 1 |
| AthenaBench: A Dynamic Benchmark for Evaluating LLMs in Cyber Threat Intelligence | Feb 14, 2026 | | —Unverified | 1 |
| Detecting Data Contamination from Reinforcement Learning Post-training for Large Language Models | Mar 18, 2026 | | —Unverified | 1 |
| Reinforced Fast Weights with Next-Sequence Prediction | Feb 18, 2026 | | —Unverified | 1 |
| RubricBench: Aligning Model-Generated Rubrics with Human Standards | Mar 3, 2026 | | —Unverified | 1 |
| Compose Your Policies! Improving Diffusion-based or Flow-based Robot Policies via Test-time Distribution-level Composition | Mar 10, 2026 | | —Unverified | 1 |
| Understanding Language Prior of LVLMs by Contrasting Chain-of-Embedding | Feb 10, 2026 | | —Unverified | 1 |
| MILR: Improving Multimodal Image Generation via Test-Time Latent Reasoning | Feb 10, 2026 | | —Unverified | 1 |
| Multimodal Prompt Optimization: Why Not Leverage Multiple Modalities for MLLMs | Feb 19, 2026 | | —Unverified | 1 |
| World Models That Know When They Don't Know - Controllable Video Generation with Calibrated Uncertainty | Mar 10, 2026 | | —Unverified | 1 |
| SR-Scientist: Scientific Equation Discovery With Agentic AI | Feb 17, 2026 | | —Unverified | 1 |
| MA-EgoQA: Question Answering over Egocentric Videos from Multiple Embodied Agents | Mar 11, 2026 | | —Unverified | 1 |
| BeyondSWE: Can Current Code Agent Survive Beyond Single-Repo Bug Fixing? | Mar 3, 2026 | | —Unverified | 1 |
| VTC-Bench: Evaluating Agentic Multimodal Models via Compositional Visual Tool Chaining | Mar 19, 2026 | | —Unverified | 1 |
| Ref-Adv: Exploring MLLM Visual Reasoning in Referring Expression Tasks | Feb 27, 2026 | | —Unverified | 1 |
| The Geometry of Reasoning: Flowing Logics in Representation Space | Mar 3, 2026 | | —Unverified | 1 |
| Next Visual Granularity Generation | Feb 28, 2026 | | —Unverified | 1 |
| Critique-Coder: Enhancing Coder Models by Critique Reinforcement Learning | Mar 12, 2026 | | —Unverified | 1 |
| GameDevBench: Evaluating Agentic Capabilities Through Game Development | Feb 11, 2026 | | —Unverified | 1 |
| Embed-RL: Reinforcement Learning for Reasoning-Driven Multimodal Embeddings | Mar 12, 2026 | | —Unverified | 1 |
| Learning to Configure Agentic AI Systems | Feb 12, 2026 | | —Unverified | 1 |
| Panoramic Affordance Prediction | Mar 16, 2026 | | —Unverified | 1 |
| Garments2Look: A Multi-Reference Dataset for High-Fidelity Outfit-Level Virtual Try-On with Clothing and Accessories | Mar 14, 2026 | | —Unverified | 1 |