SOTAVerified

Memorization

Papers

Showing 150 of 1088 papers

TitleStatusHype
MemoRAG: Moving towards Next-Gen RAG Via Memory-Inspired Knowledge DiscoveryCode7
Pythia: A Suite for Analyzing Large Language Models Across Training and ScalingCode6
LIMO: Less is More for ReasoningCode5
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language modelsCode4
Parameter Efficient Instruction Tuning: An Empirical StudyCode4
VideoChat-Flash: Hierarchical Compression for Long-Context Video ModelingCode4
Grokking: Generalization Beyond Overfitting on Small Algorithmic DatasetsCode4
Amortized Planning with Large-Scale Transformers: A Case Study on ChessCode4
MUSE: Machine Unlearning Six-Way Evaluation for Language ModelsCode4
R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement LearningCode4
MathArena: Evaluating LLMs on Uncontaminated Math CompetitionsCode3
AgentTuning: Enabling Generalized Agent Abilities for LLMsCode3
From Matching to Generation: A Survey on Generative Information RetrievalCode3
LawBench: Benchmarking Legal Knowledge of Large Language ModelsCode2
PaLM: Scaling Language Modeling with PathwaysCode2
HMT: Hierarchical Memory Transformer for Long Context Language ProcessingCode2
Consistent Diffusion Meets Tweedie: Training Exact Ambient Diffusion Models with Noisy DataCode2
Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMsCode2
We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?Code2
RARE: Retrieval-Augmented Reasoning ModelingCode2
Scaling Language Models: Methods, Analysis & Insights from Training GopherCode2
Quantifying Memorization Across Neural Language ModelsCode2
SciAssess: Benchmarking LLM Proficiency in Scientific Literature AnalysisCode2
Causal Reasoning and Large Language Models: Opening a New Frontier for CausalityCode2
Practical Membership Inference Attacks against Fine-tuned Large Language Models via Self-prompt CalibrationCode2
A Decade's Battle on Dataset Bias: Are We There Yet?Code2
SimplyRetrieve: A Private and Lightweight Retrieval-Centric Generative AI ToolCode2
Learning explanations that are hard to varyCode2
HeuriGym: An Agentic Benchmark for LLM-Crafted Heuristics in Combinatorial OptimizationCode2
Exposing flaws of generative model evaluation metrics and their unfair treatment of diffusion modelsCode2
Detecting, Explaining, and Mitigating Memorization in Diffusion ModelsCode2
Decoupling Knowledge from Memorization: Retrieval-augmented Prompt LearningCode2
Drive Like a Human: Rethinking Autonomous Driving with Large Language ModelsCode2
DS-1000: A Natural and Reliable Benchmark for Data Science Code GenerationCode2
LLM-SRBench: A New Benchmark for Scientific Equation Discovery with Large Language ModelsCode2
Data Contamination Quiz: A Tool to Detect and Estimate Contamination in Large Language ModelsCode1
Data Unlearning in Diffusion ModelsCode1
Beyond Gradient Averaging in Parallel Optimization: Improved Robustness through Gradient Agreement FilteringCode1
Advancing Cross-domain Discriminability in Continual Learning of Vision-Language ModelsCode1
Data Contamination Can Cross Language BarriersCode1
DAT: Training Deep Networks Robust To Label-Noise by Matching the Feature DistributionsCode1
Cousins Of The Vendi Score: A Family Of Similarity-Based Diversity Metrics For Science And Machine LearningCode1
C-SFDA: A Curriculum Learning Aided Self-Training Framework for Efficient Source Free Domain AdaptationCode1
Benchmarking Chinese Commonsense Reasoning of LLMs: From Chinese-Specifics to Reasoning-Memorization CorrelationsCode1
AutomaTikZ: Text-Guided Synthesis of Scientific Vector Graphics with TikZCode1
Zero-Shot Compositional Policy Learning via Language GroundingCode1
Co-teaching: Robust Training of Deep Neural Networks with Extremely Noisy LabelsCode1
DASH: Warm-Starting Neural Network Training in Stationary Settings without Loss of PlasticityCode1
Adaptive Early-Learning Correction for Segmentation from Noisy AnnotationsCode1
Deciphering the Factors Influencing the Efficacy of Chain-of-Thought: Probability, Memorization, and Noisy ReasoningCode1
Show:102550
← PrevPage 1 of 22Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PaLM-540B (few-shot, k=5)Accuracy95.4Unverified
2Gopher-280B (few-shot, k=5)Accuracy80Unverified
3PaLM-62B (few-shot, k=5)Accuracy77.7Unverified