| Ref-Long: Benchmarking the Long-context Referencing Capability of Long-context Language Models | Jul 13, 2025 | AttributeBenchmarking | CodeCode Available | 0 |
| Cache Me If You Can: How Many KVs Do You Need for Effective Long-Context LMs? | Jun 20, 2025 | Book summarizationLong-Context Understanding | CodeCode Available | 1 |
| PaceLLM: Brain-Inspired Large Language Models for Long-Context Understanding | Jun 18, 2025 | Long-Context Understanding | —Unverified | 0 |
| DAM: Dynamic Attention Mask for Long-Context Large Language Model Inference Acceleration | Jun 6, 2025 | Computational EfficiencyLanguage Modeling | CodeCode Available | 1 |
| MesaNet: Sequence Modeling by Locally Optimal Test-Time Training | Jun 5, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| ATLAS: Learning to Optimally Memorize the Context at Test Time | May 29, 2025 | Common Sense ReasoningLanguage Modeling | —Unverified | 0 |
| SpecExtend: A Drop-in Enhancement for Speculative Decoding of Long Sequences | May 27, 2025 | 16kLong-Context Understanding | CodeCode Available | 0 |
| Can Compressed LLMs Truly Act? An Empirical Evaluation of Agentic Capabilities in LLM Compression | May 26, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| MiniLongBench: The Low-cost Long Context Understanding Benchmark for Large Language Models | May 26, 2025 | Data CompressionLong-Context Understanding | CodeCode Available | 1 |
| Beyond Needle(s) in the Embodied Haystack: Environment, Architecture, and Training Considerations for Long Context Reasoning | May 22, 2025 | Long-Context Understanding | —Unverified | 0 |
| Too Long, Didn't Model: Decomposing LLM Long-Context Understanding With Novels | May 20, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| Towards Robust Evaluation of STEM Education: Leveraging MLLMs in Project-Based Learning | May 16, 2025 | HallucinationInformation Retrieval | —Unverified | 0 |
| LiveLongBench: Tackling Long-Context Understanding for Spoken Texts from Live Streams | Apr 24, 2025 | Long-Context UnderstandingSpoken Language Understanding | CodeCode Available | 1 |
| MOOSComp: Improving Lightweight Long-Context Compressor via Mitigating Over-Smoothing and Incorporating Outlier Scores | Apr 23, 2025 | Long-Context Understandingtoken-classification | —Unverified | 0 |
| LongMamba: Enhancing Mamba's Long Context Capabilities via Training-Free Receptive Field Enlargement | Apr 22, 2025 | BenchmarkingLanguage Modeling | CodeCode Available | 1 |
| Kimi-VL Technical Report | Apr 10, 2025 | Long-Context UnderstandingMathematical Reasoning | CodeCode Available | 5 |
| CURIE: Evaluating LLMs On Multitask Scientific Long Context Understanding and Reasoning | Mar 14, 2025 | Long-Context Understanding | CodeCode Available | 1 |
| Token Weighting for Long-Range Language Modeling | Mar 12, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| LongProLIP: A Probabilistic Vision-Language Model with Long Context Text | Mar 11, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Self-Taught Agentic Long Context Understanding | Feb 21, 2025 | Long-Context Understanding | CodeCode Available | 1 |
| LIFT: Improving Long Context Understanding of Large Language Models through Long Input Fine-Tuning | Feb 20, 2025 | In-Context LearningLong-Context Understanding | —Unverified | 0 |
| SCALAR: Scientific Citation-based Live Assessment of Long-context Academic Reasoning | Feb 19, 2025 | Long-Context Understanding | CodeCode Available | 0 |
| Facilitating Long Context Understanding via Supervised Chain-of-Thought Reasoning | Feb 18, 2025 | 2kLong-Context Understanding | —Unverified | 0 |
| Fino1: On the Transferability of Reasoning Enhanced LLMs to Finance | Feb 12, 2025 | BenchmarkingLong-Context Understanding | CodeCode Available | 2 |
| BenchMAX: A Comprehensive Multilingual Evaluation Suite for Large Language Models | Feb 11, 2025 | Code GenerationInstruction Following | CodeCode Available | 1 |