| AgenticRed: Optimizing Agentic Systems for Automated Red-teaming | Jan 29, 2026 | | —Unverified | 1 |
| MuSLR: Multimodal Symbolic Logical Reasoning | Jan 29, 2026 | | —Unverified | 1 |
| CAR-bench: Evaluating the Consistency and Limit-Awareness of LLM Agents under Real-World Uncertainty | Jan 29, 2026 | | —Unverified | 1 |
| Hybrid Linear Attention Done Right: Efficient Distillation and Effective Architectures for Extremely Long Contexts | Jan 29, 2026 | | —Unverified | 1 |
| Do Reasoning Models Enhance Embedding Models? | Jan 29, 2026 | | —Unverified | 1 |
| Failing to Explore: Language Models on Interactive Tasks | Jan 29, 2026 | | —Unverified | 1 |
| Epistemic Diversity and Knowledge Collapse in Large Language Models | Jan 28, 2026 | | —Unverified | 1 |
| Gabliteration: Adaptive Multi-Directional Neural Weight Modification for Selective Behavioral Alteration in Large Language Models | Jan 28, 2026 | | —Unverified | 1 |
| Reasoning in the Dark: Interleaved Vision-Text Reasoning in Latent Space | Jan 28, 2026 | | —Unverified | 1 |
| Elastic Attention: Test-time Adaptive Sparsity Ratios for Efficient Transformers | Jan 28, 2026 | | —Unverified | 1 |
| Optimal Scaling Needs Optimal Norm | Jan 27, 2026 | | —Unverified | 1 |
| SingMOS-Pro: An Comprehensive Benchmark for Singing Quality Assessment | Jan 27, 2026 | | —Unverified | 1 |
| Vlaser: Vision-Language-Action Model with Synergistic Embodied Reasoning | Jan 27, 2026 | | —Unverified | 1 |
| CooperBench: Why Coding Agents Cannot be Your Teammates Yet | Jan 26, 2026 | | —Unverified | 1 |
| TSRBench: A Comprehensive Multi-task Multi-modal Time Series Reasoning Benchmark for Generalist Models | Jan 26, 2026 | | —Unverified | 1 |
| SWE-EVO: Benchmarking Coding Agents in Long-Horizon Software Evolution Scenarios | Jan 26, 2026 | | —Unverified | 1 |
| Multimodal Evaluation of Russian-language Architectures | Jan 26, 2026 | | —Unverified | 1 |
| PaperSearchQA: Learning to Search and Reason over Scientific Papers with RLVR | Jan 26, 2026 | | —Unverified | 1 |
| One Adapts to Any: Meta Reward Modeling for Personalized LLM Alignment | Jan 26, 2026 | | —Unverified | 1 |
| Omni-AVSR: Towards Unified Multimodal Speech Recognition with Large Language Models | Jan 26, 2026 | | —Unverified | 1 |
| AR-Omni: A Unified Autoregressive Model for Any-to-Any Generation | Jan 25, 2026 | | —Unverified | 1 |
| Flow-based Extremal Mathematical Structure Discovery | Jan 25, 2026 | | —Unverified | 1 |
| UPLiFT: Efficient Pixel-Dense Feature Upsampling with Local Attenders | Jan 25, 2026 | | —Unverified | 1 |
| TensorLens: End-to-End Transformer Analysis via High-Order Attention Tensors | Jan 25, 2026 | | —Unverified | 1 |
| OpenDecoder: Open Large Language Model Decoding to Incorporate Document Quality in RAG | Jan 24, 2026 | | —Unverified | 1 |
| Are We Using the Right Benchmark: An Evaluation Framework for Visual Token Compression Methods | Jan 23, 2026 | | —Unverified | 1 |
| WildScore: Benchmarking MLLMs in-the-Wild Symbolic Music Reasoning | Jan 23, 2026 | | —Unverified | 1 |
| FantasyVLN: Unified Multimodal Chain-of-Thought Reasoning for Vision-Language Navigation | Jan 23, 2026 | | —Unverified | 1 |
| Least-Loaded Expert Parallelism: Load Balancing An Imbalanced Mixture-of-Experts | Jan 23, 2026 | | —Unverified | 1 |
| Memory-V2V: Augmenting Video-to-Video Diffusion Models with Memory | Jan 22, 2026 | | —Unverified | 1 |
| Can Language Models Discover Scaling Laws? | Jan 22, 2026 | | —Unverified | 1 |
| DSGym: A Holistic Framework for Evaluating and Training Data Science Agents | Jan 22, 2026 | | —Unverified | 1 |
| A Mechanistic View on Video Generation as World Models: State and Dynamics | Jan 22, 2026 | | —Unverified | 1 |
| Does Object Binding Naturally Emerge in Large Pretrained Vision Transformers? | Jan 21, 2026 | | —Unverified | 1 |
| From Charts to Code: A Hierarchical Benchmark for Multimodal Models | Jan 21, 2026 | | —Unverified | 1 |
| Universal Reasoning Model | Dec 26, 2025 | | VerifiedCommunity Verified — 1 reproduction | 1 |
| NeuroXAI: Adaptive, robust, explainable surrogate framework for determination of channel importance in EEG application | Sep 12, 2025 | channel selectionEEG | CodeCode Available | 1 |
| Tri-Learn Graph Fusion Network for Attributed Graph Clustering | Jul 18, 2025 | ClusteringDeep Clustering | CodeCode Available | 1 |
| FLEXITOKENS: Flexible Tokenization for Evolving Language Models | Jul 17, 2025 | | CodeCode Available | 1 |
| Describe Anything Model for Visual Question Answering on Text-rich Images | Jul 16, 2025 | DescriptiveLanguage Modeling | CodeCode Available | 1 |
| Mitigating Object Hallucinations via Sentence-Level Early Intervention | Jul 16, 2025 | HallucinationMM-Vet | CodeCode Available | 1 |
| InstructFLIP: Exploring Unified Vision-Language Model for Face Anti-spoofing | Jul 16, 2025 | Domain GeneralizationFace Anti-Spoofing | CodeCode Available | 1 |
| MonoMVSNet: Monocular Priors Guided Multi-View Stereo Network | Jul 15, 2025 | Depth EstimationDepth Prediction | CodeCode Available | 1 |
| Fairness-Aware Grouping for Continuous Sensitive Variables: Application for Debiasing Face Analysis with respect to Skin Tone | Jul 15, 2025 | Fairness | CodeCode Available | 1 |
| AdaMuon: Adaptive Muon Optimizer | Jul 15, 2025 | | CodeCode Available | 1 |
| Learning to Tune Like an Expert: Interpretable and Scene-Aware Navigation via MLLM Reasoning and CVAE-Based Adaptation | Jul 15, 2025 | Large Language ModelScene Understanding | CodeCode Available | 1 |
| Relative Entropy Pathwise Policy Optimization | Jul 15, 2025 | GPU | CodeCode Available | 1 |
| Are Vision Foundation Models Ready for Out-of-the-Box Medical Image Registration? | Jul 15, 2025 | AnatomyImage Registration | CodeCode Available | 1 |
| MMOne: Representing Multiple Modalities in One Scene | Jul 15, 2025 | | CodeCode Available | 1 |
| UGC-VideoCaptioner: An Omni UGC Video Detail Caption Model and New Benchmarks | Jul 15, 2025 | Video CaptioningVideo Understanding | CodeCode Available | 1 |