| MathArena: Evaluating LLMs on Uncontaminated Math Competitions | May 29, 2025 | MathMathematical Reasoning | CodeCode Available | 3 |
| From Matching to Generation: A Survey on Generative Information Retrieval | Apr 23, 2024 | Incremental LearningInformation Retrieval | CodeCode Available | 3 |
| AgentTuning: Enabling Generalized Agent Abilities for LLMs | Oct 19, 2023 | Memorization | CodeCode Available | 3 |
| HeuriGym: An Agentic Benchmark for LLM-Crafted Heuristics in Combinatorial Optimization | Jun 9, 2025 | Combinatorial OptimizationMemorization | CodeCode Available | 2 |
| LLM-SRBench: A New Benchmark for Scientific Equation Discovery with Large Language Models | Apr 14, 2025 | Equation DiscoveryMemorization | CodeCode Available | 2 |
| RARE: Retrieval-Augmented Reasoning Modeling | Mar 30, 2025 | HallucinationMemorization | CodeCode Available | 2 |
| Detecting, Explaining, and Mitigating Memorization in Diffusion Models | Jul 31, 2024 | Image GenerationMemorization | CodeCode Available | 2 |
| We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning? | Jul 1, 2024 | MathMathematical Reasoning | CodeCode Available | 2 |
| Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs | Jun 14, 2024 | Memorization | CodeCode Available | 2 |
| HMT: Hierarchical Memory Transformer for Long Context Language Processing | May 9, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |