| Learning from Trials and Errors: Reflective Test-Time Planning for Embodied LLMs | Feb 24, 2026 | | —Unverified | 1 |
| Dropping Anchor and Spherical Harmonics for Sparse-view Gaussian Splatting | Feb 24, 2026 | | —Unverified | 1 |
| Tool-R0: Self-Evolving LLM Agents for Tool-Learning from Zero Data | Feb 24, 2026 | | —Unverified | 1 |
| GOT-Edit: Geometry-Aware Generic Object Tracking via Online Model Editing | Feb 24, 2026 | | —Unverified | 1 |
| Untied Ulysses: Memory-Efficient Context Parallelism via Headwise Chunking | Feb 24, 2026 | | —Unverified | 1 |
| Nacrith: Neural Lossless Compression via Ensemble Context Modeling and High-Precision CDF Coding | Feb 24, 2026 | | —Unverified | 1 |
| MergeMix: A Unified Augmentation Paradigm for Visual and Multi-Modal Understanding | Feb 23, 2026 | | —Unverified | 1 |
| MIST: Mutual Information Estimation Via Supervised Training | Feb 23, 2026 | | —Unverified | 1 |
| MedCLIPSeg: Probabilistic Vision-Language Adaptation for Data-Efficient and Generalizable Medical Image Segmentation | Feb 23, 2026 | | —Unverified | 1 |
| Anatomy of Agentic Memory: Taxonomy and Empirical Analysis of Evaluation and System Limitations | Feb 22, 2026 | | —Unverified | 1 |
| SeaCache: Spectral-Evolution-Aware Cache for Accelerating Diffusion Models | Feb 22, 2026 | | —Unverified | 1 |
| WildOS: Open-Vocabulary Object Search in the Wild | Feb 22, 2026 | | —Unverified | 1 |
| RewardMap: Tackling Sparse Rewards in Fine-grained Visual Reasoning via Multi-Stage Reinforcement Learning | Feb 21, 2026 | | —Unverified | 1 |
| Adam Improves Muon: Adaptive Moment Estimation with Orthogonalized Momentum | Feb 20, 2026 | | —Unverified | 1 |
| Multimodal Prompt Optimization: Why Not Leverage Multiple Modalities for MLLMs | Feb 19, 2026 | | —Unverified | 1 |
| Learning Personalized Agents from Human Feedback | Feb 18, 2026 | | —Unverified | 1 |
| Reinforced Fast Weights with Next-Sequence Prediction | Feb 18, 2026 | | —Unverified | 1 |
| Does Socialization Emerge in AI Agent Society? A Case Study of Moltbook | Feb 18, 2026 | | —Unverified | 1 |
| Synthesizing High-Quality Visual Question Answering from Medical Documents with Generator-Verifier LMMs | Feb 18, 2026 | | —Unverified | 1 |
| DeepVision-103K: A Visually Diverse, Broad-Coverage, and Verifiable Mathematical Dataset for Multimodal Reasoning | Feb 18, 2026 | | —Unverified | 1 |
| m1: Unleash the Potential of Test-Time Scaling for Medical Reasoning with Large Language Models | Feb 18, 2026 | | —Unverified | 1 |
| ReLoop: Structured Modeling and Behavioral Verification for Reliable LLM-Based Optimization | Feb 17, 2026 | | —Unverified | 1 |
| Avey-B | Feb 17, 2026 | | —Unverified | 1 |
| SR-Scientist: Scientific Equation Discovery With Agentic AI | Feb 17, 2026 | | —Unverified | 1 |
| Deep Ignorance: Filtering Pretraining Data Builds Tamper-Resistant Safeguards into Open-Weight LLMs | Feb 17, 2026 | | —Unverified | 1 |
| MARS: Modular Agent with Reflective Search for Automated AI Research | Feb 17, 2026 | | —Unverified | 1 |
| EditCtrl: Disentangled Local and Global Control for Real-Time Generative Video Editing | Feb 16, 2026 | | —Unverified | 1 |
| Revisiting the Platonic Representation Hypothesis: An Aristotelian View | Feb 16, 2026 | | —Unverified | 1 |
| Stroke3D: Lifting 2D strokes into rigged 3D model via latent diffusion models | Feb 16, 2026 | | —Unverified | 1 |
| A Trajectory-Based Safety Audit of Clawdbot (OpenClaw) | Feb 16, 2026 | | —Unverified | 1 |
| Privileged Information Distillation for Language Models | Feb 16, 2026 | | —Unverified | 1 |
| Image Generation with a Sphere Encoder | Feb 16, 2026 | | —Unverified | 1 |
| InnoEval: On Research Idea Evaluation as a Knowledge-Grounded, Multi-Perspective Reasoning Problem | Feb 16, 2026 | | —Unverified | 1 |
| Efficient Test-Time Scaling for Small Vision-Language Models | Feb 16, 2026 | | —Unverified | 1 |
| Self-Improving World Modelling with Latent Actions | Feb 15, 2026 | | —Unverified | 1 |
| Scaling Behavior of Discrete Diffusion Language Models | Feb 15, 2026 | | —Unverified | 1 |
| BiasFreeBench: a Benchmark for Mitigating Bias in Large Language Model Responses | Feb 15, 2026 | | —Unverified | 1 |
| AthenaBench: A Dynamic Benchmark for Evaluating LLMs in Cyber Threat Intelligence | Feb 14, 2026 | | —Unverified | 1 |
| GISA: A Benchmark for General Information-Seeking Assistant | Feb 13, 2026 | | —Unverified | 1 |
| SciAgentGym: Benchmarking Multi-Step Scientific Tool-use in LLM Agents | Feb 13, 2026 | | —Unverified | 1 |
| SQuTR: A Robustness Benchmark for Spoken Query to Text Retrieval under Acoustic Noise | Feb 13, 2026 | | —Unverified | 1 |
| Conversational Image Segmentation: Grounding Abstract Concepts with Scalable Supervision | Feb 13, 2026 | | —Unverified | 1 |
| Kairos: Toward Adaptive and Parameter-Efficient Time Series Foundation Models | Feb 13, 2026 | | —Unverified | 1 |
| Benchmarking Vision-Language Models for French PDF-to-Markdown Conversion | Feb 12, 2026 | | —Unverified | 1 |
| The Pensieve Paradigm: Stateful Language Models Mastering Their Own Context | Feb 12, 2026 | | —Unverified | 1 |
| P-GenRM: Personalized Generative Reward Model with Test-time User-based Scaling | Feb 12, 2026 | | —Unverified | 1 |
| Stroke of Surprise: Progressive Semantic Illusions in Vector Sketching | Feb 12, 2026 | | —Unverified | 1 |
| Can World Simulators Reason? Gen-ViRe: A Generative Visual Reasoning Benchmark | Feb 12, 2026 | | —Unverified | 1 |
| Which Reasoning Trajectories Teach Students to Reason Better? A Simple Metric of Informative Alignment | Feb 12, 2026 | | —Unverified | 1 |
| DeepSight: An All-in-One LM Safety Toolkit | Feb 12, 2026 | | —Unverified | 1 |