SOTAVerified|Agents Browse Leaderboard About Blog

Memorization

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 21–30 of 1088 papers

Title	Date	Tasks	Status	Hype
Consistent Diffusion Meets Tweedie: Training Exact Ambient Diffusion Models with Noisy Data	Mar 20, 2024	Memorization	CodeCode Available	2
A Decade's Battle on Dataset Bias: Are We There Yet?	Mar 13, 2024	Memorization	CodeCode Available	2
SciAssess: Benchmarking LLM Proficiency in Scientific Literature Analysis	Mar 4, 2024	BenchmarkingDrug Discovery	CodeCode Available	2
Practical Membership Inference Attacks against Fine-tuned Large Language Models via Self-prompt Calibration	Nov 10, 2023	Inference AttackMembership Inference Attack	CodeCode Available	2
LawBench: Benchmarking Legal Knowledge of Large Language Models	Sep 28, 2023	ArticlesBenchmarking	CodeCode Available	2
SimplyRetrieve: A Private and Lightweight Retrieval-Centric Generative AI Tool	Aug 8, 2023	Language ModelingLanguage Modelling	CodeCode Available	2
Drive Like a Human: Rethinking Autonomous Driving with Large Language Models	Jul 14, 2023	Autonomous DrivingCommon Sense Reasoning	CodeCode Available	2
Exposing flaws of generative model evaluation metrics and their unfair treatment of diffusion models	Jun 7, 2023	DiversityImage Generation	CodeCode Available	2
Causal Reasoning and Large Language Models: Opening a New Frontier for Causality	Apr 28, 2023	Causal DiscoveryCommon Sense Reasoning	CodeCode Available	2
DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation	Nov 18, 2022	Code GenerationMemorization	CodeCode Available	2

Show:10 25 50

← PrevPage 3 of 109Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	PaLM-540B (few-shot, k=5)	Accuracy	95.4	—	Unverified
2	Gopher-280B (few-shot, k=5)	Accuracy	80	—	Unverified
3	PaLM-62B (few-shot, k=5)	Accuracy	77.7	—	Unverified