| From Matching to Generation: A Survey on Generative Information Retrieval | Apr 23, 2024 | Incremental LearningInformation Retrieval | CodeCode Available | 3 | 5 |
| AgentTuning: Enabling Generalized Agent Abilities for LLMs | Oct 19, 2023 | Memorization | CodeCode Available | 3 | 5 |
| MathArena: Evaluating LLMs on Uncontaminated Math Competitions | May 29, 2025 | MathMathematical Reasoning | CodeCode Available | 3 | 5 |
| LawBench: Benchmarking Legal Knowledge of Large Language Models | Sep 28, 2023 | ArticlesBenchmarking | CodeCode Available | 2 | 5 |
| Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs | Jun 14, 2024 | Memorization | CodeCode Available | 2 | 5 |
| Learning explanations that are hard to vary | Sep 1, 2020 | Memorization | CodeCode Available | 2 | 5 |
| HeuriGym: An Agentic Benchmark for LLM-Crafted Heuristics in Combinatorial Optimization | Jun 9, 2025 | Combinatorial OptimizationMemorization | CodeCode Available | 2 | 5 |
| HMT: Hierarchical Memory Transformer for Long Context Language Processing | May 9, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| Exposing flaws of generative model evaluation metrics and their unfair treatment of diffusion models | Jun 7, 2023 | DiversityImage Generation | CodeCode Available | 2 | 5 |
| Drive Like a Human: Rethinking Autonomous Driving with Large Language Models | Jul 14, 2023 | Autonomous DrivingCommon Sense Reasoning | CodeCode Available | 2 | 5 |