| MemoRAG: Moving towards Next-Gen RAG Via Memory-Inspired Knowledge Discovery | Sep 9, 2024 | MemorizationQuestion Answering | CodeCode Available | 7 |
| Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling | Apr 3, 2023 | Common Sense ReasoningCoreference Resolution | CodeCode Available | 6 |
| LIMO: Less is More for Reasoning | Feb 5, 2025 | MathMathematical Reasoning | CodeCode Available | 5 |
| R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning | May 22, 2025 | MemorizationRAG | CodeCode Available | 4 |
| VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling | Dec 31, 2024 | Memorization | CodeCode Available | 4 |
| Parameter Efficient Instruction Tuning: An Empirical Study | Nov 25, 2024 | Instruction FollowingMemorization | CodeCode Available | 4 |
| MUSE: Machine Unlearning Six-Way Evaluation for Language Models | Jul 8, 2024 | ArticlesMachine Unlearning | CodeCode Available | 4 |
| Amortized Planning with Large-Scale Transformers: A Case Study on Chess | Feb 7, 2024 | Memorization | CodeCode Available | 4 |
| Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models | Jun 9, 2022 | Common Sense ReasoningMath | CodeCode Available | 4 |
| Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets | Jan 6, 2022 | Memorization | CodeCode Available | 4 |
| MathArena: Evaluating LLMs on Uncontaminated Math Competitions | May 29, 2025 | MathMathematical Reasoning | CodeCode Available | 3 |
| From Matching to Generation: A Survey on Generative Information Retrieval | Apr 23, 2024 | Incremental LearningInformation Retrieval | CodeCode Available | 3 |
| AgentTuning: Enabling Generalized Agent Abilities for LLMs | Oct 19, 2023 | Memorization | CodeCode Available | 3 |
| HeuriGym: An Agentic Benchmark for LLM-Crafted Heuristics in Combinatorial Optimization | Jun 9, 2025 | Combinatorial OptimizationMemorization | CodeCode Available | 2 |
| LLM-SRBench: A New Benchmark for Scientific Equation Discovery with Large Language Models | Apr 14, 2025 | Equation DiscoveryMemorization | CodeCode Available | 2 |
| RARE: Retrieval-Augmented Reasoning Modeling | Mar 30, 2025 | HallucinationMemorization | CodeCode Available | 2 |
| Detecting, Explaining, and Mitigating Memorization in Diffusion Models | Jul 31, 2024 | Image GenerationMemorization | CodeCode Available | 2 |
| We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning? | Jul 1, 2024 | MathMathematical Reasoning | CodeCode Available | 2 |
| Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs | Jun 14, 2024 | Memorization | CodeCode Available | 2 |
| HMT: Hierarchical Memory Transformer for Long Context Language Processing | May 9, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Consistent Diffusion Meets Tweedie: Training Exact Ambient Diffusion Models with Noisy Data | Mar 20, 2024 | Memorization | CodeCode Available | 2 |
| A Decade's Battle on Dataset Bias: Are We There Yet? | Mar 13, 2024 | Memorization | CodeCode Available | 2 |
| SciAssess: Benchmarking LLM Proficiency in Scientific Literature Analysis | Mar 4, 2024 | BenchmarkingDrug Discovery | CodeCode Available | 2 |
| Practical Membership Inference Attacks against Fine-tuned Large Language Models via Self-prompt Calibration | Nov 10, 2023 | Inference AttackMembership Inference Attack | CodeCode Available | 2 |
| LawBench: Benchmarking Legal Knowledge of Large Language Models | Sep 28, 2023 | ArticlesBenchmarking | CodeCode Available | 2 |
| SimplyRetrieve: A Private and Lightweight Retrieval-Centric Generative AI Tool | Aug 8, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Drive Like a Human: Rethinking Autonomous Driving with Large Language Models | Jul 14, 2023 | Autonomous DrivingCommon Sense Reasoning | CodeCode Available | 2 |
| Exposing flaws of generative model evaluation metrics and their unfair treatment of diffusion models | Jun 7, 2023 | DiversityImage Generation | CodeCode Available | 2 |
| Causal Reasoning and Large Language Models: Opening a New Frontier for Causality | Apr 28, 2023 | Causal DiscoveryCommon Sense Reasoning | CodeCode Available | 2 |
| DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation | Nov 18, 2022 | Code GenerationMemorization | CodeCode Available | 2 |
| Decoupling Knowledge from Memorization: Retrieval-augmented Prompt Learning | May 29, 2022 | Few-Shot Text ClassificationMemorization | CodeCode Available | 2 |
| PaLM: Scaling Language Modeling with Pathways | Apr 5, 2022 | Auto DebuggingCode Generation | CodeCode Available | 2 |
| Quantifying Memorization Across Neural Language Models | Feb 15, 2022 | FairnessMemorization | CodeCode Available | 2 |
| Scaling Language Models: Methods, Analysis & Insights from Training Gopher | Dec 8, 2021 | Abstract AlgebraAnachronisms | CodeCode Available | 2 |
| Learning explanations that are hard to vary | Sep 1, 2020 | Memorization | CodeCode Available | 2 |
| Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination | Jul 14, 2025 | MathMathematical Reasoning | CodeCode Available | 1 |
| Generative Modeling of Weights: Generalization or Memorization? | Jun 9, 2025 | MemorizationVideo Generation | CodeCode Available | 1 |
| How does Transformer Learn Implicit Reasoning? | May 29, 2025 | ClusteringDiagnostic | CodeCode Available | 1 |
| Pre-training Large Memory Language Models with Internal and External Knowledge | May 21, 2025 | Memorization | CodeCode Available | 1 |
| Generative Evaluation of Complex Reasoning in Large Language Models | Apr 3, 2025 | BenchmarkingMemorization | CodeCode Available | 1 |
| GMAI-VL-R1: Harnessing Reinforcement Learning for Multimodal Medical Reasoning | Apr 2, 2025 | Decision MakingDiagnostic | CodeCode Available | 1 |
| LoTUS: Large-Scale Machine Unlearning with a Taste of Uncertainty | Mar 24, 2025 | Machine UnlearningMemorization | CodeCode Available | 1 |
| Can Language Models Follow Multiple Turns of Entangled Instructions? | Mar 17, 2025 | Instruction FollowingMemorization | CodeCode Available | 1 |
| Data Unlearning in Diffusion Models | Mar 2, 2025 | Machine UnlearningMemorization | CodeCode Available | 1 |
| Mitigating Unintended Memorization with LoRA in Federated Learning for LLMs | Feb 7, 2025 | Federated LearningMedical Question Answering | CodeCode Available | 1 |
| Beyond Gradient Averaging in Parallel Optimization: Improved Robustness through Gradient Agreement Filtering | Dec 24, 2024 | image-classificationImage Classification | CodeCode Available | 1 |
| Learning to Generate Gradients for Test-Time Adaptation via Test-Time Training Layers | Dec 22, 2024 | MemorizationTest-time Adaptation | CodeCode Available | 1 |
| The Complexity Dynamics of Grokking | Dec 13, 2024 | Generalization BoundsMemorization | CodeCode Available | 1 |
| The Pitfalls of Memorization: When Memorization Hurts Generalization | Dec 10, 2024 | Memorization | CodeCode Available | 1 |
| What Do Learning Dynamics Reveal About Generalization in LLM Reasoning? | Nov 12, 2024 | GSM8KMath | CodeCode Available | 1 |