SOTAVerified

Memorization

Papers

Showing 150 of 1088 papers

TitleStatusHype
MemoRAG: Moving towards Next-Gen RAG Via Memory-Inspired Knowledge DiscoveryCode7
Pythia: A Suite for Analyzing Large Language Models Across Training and ScalingCode6
LIMO: Less is More for ReasoningCode5
R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement LearningCode4
VideoChat-Flash: Hierarchical Compression for Long-Context Video ModelingCode4
Parameter Efficient Instruction Tuning: An Empirical StudyCode4
MUSE: Machine Unlearning Six-Way Evaluation for Language ModelsCode4
Amortized Planning with Large-Scale Transformers: A Case Study on ChessCode4
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language modelsCode4
Grokking: Generalization Beyond Overfitting on Small Algorithmic DatasetsCode4
MathArena: Evaluating LLMs on Uncontaminated Math CompetitionsCode3
From Matching to Generation: A Survey on Generative Information RetrievalCode3
AgentTuning: Enabling Generalized Agent Abilities for LLMsCode3
HeuriGym: An Agentic Benchmark for LLM-Crafted Heuristics in Combinatorial OptimizationCode2
LLM-SRBench: A New Benchmark for Scientific Equation Discovery with Large Language ModelsCode2
RARE: Retrieval-Augmented Reasoning ModelingCode2
Detecting, Explaining, and Mitigating Memorization in Diffusion ModelsCode2
We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?Code2
Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMsCode2
HMT: Hierarchical Memory Transformer for Long Context Language ProcessingCode2
Consistent Diffusion Meets Tweedie: Training Exact Ambient Diffusion Models with Noisy DataCode2
A Decade's Battle on Dataset Bias: Are We There Yet?Code2
SciAssess: Benchmarking LLM Proficiency in Scientific Literature AnalysisCode2
Practical Membership Inference Attacks against Fine-tuned Large Language Models via Self-prompt CalibrationCode2
LawBench: Benchmarking Legal Knowledge of Large Language ModelsCode2
SimplyRetrieve: A Private and Lightweight Retrieval-Centric Generative AI ToolCode2
Drive Like a Human: Rethinking Autonomous Driving with Large Language ModelsCode2
Exposing flaws of generative model evaluation metrics and their unfair treatment of diffusion modelsCode2
Causal Reasoning and Large Language Models: Opening a New Frontier for CausalityCode2
DS-1000: A Natural and Reliable Benchmark for Data Science Code GenerationCode2
Decoupling Knowledge from Memorization: Retrieval-augmented Prompt LearningCode2
PaLM: Scaling Language Modeling with PathwaysCode2
Quantifying Memorization Across Neural Language ModelsCode2
Scaling Language Models: Methods, Analysis & Insights from Training GopherCode2
Learning explanations that are hard to varyCode2
Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data ContaminationCode1
Generative Modeling of Weights: Generalization or Memorization?Code1
How does Transformer Learn Implicit Reasoning?Code1
Pre-training Large Memory Language Models with Internal and External KnowledgeCode1
Generative Evaluation of Complex Reasoning in Large Language ModelsCode1
GMAI-VL-R1: Harnessing Reinforcement Learning for Multimodal Medical ReasoningCode1
LoTUS: Large-Scale Machine Unlearning with a Taste of UncertaintyCode1
Can Language Models Follow Multiple Turns of Entangled Instructions?Code1
Data Unlearning in Diffusion ModelsCode1
Mitigating Unintended Memorization with LoRA in Federated Learning for LLMsCode1
Beyond Gradient Averaging in Parallel Optimization: Improved Robustness through Gradient Agreement FilteringCode1
Learning to Generate Gradients for Test-Time Adaptation via Test-Time Training LayersCode1
The Complexity Dynamics of GrokkingCode1
The Pitfalls of Memorization: When Memorization Hurts GeneralizationCode1
What Do Learning Dynamics Reveal About Generalization in LLM Reasoning?Code1
Show:102550
← PrevPage 1 of 22Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PaLM-540B (few-shot, k=5)Accuracy95.4Unverified
2Gopher-280B (few-shot, k=5)Accuracy80Unverified
3PaLM-62B (few-shot, k=5)Accuracy77.7Unverified