SOTAVerified

Memorization

Papers

Showing 2130 of 1088 papers

TitleStatusHype
Consistent Diffusion Meets Tweedie: Training Exact Ambient Diffusion Models with Noisy DataCode2
A Decade's Battle on Dataset Bias: Are We There Yet?Code2
SciAssess: Benchmarking LLM Proficiency in Scientific Literature AnalysisCode2
Practical Membership Inference Attacks against Fine-tuned Large Language Models via Self-prompt CalibrationCode2
LawBench: Benchmarking Legal Knowledge of Large Language ModelsCode2
SimplyRetrieve: A Private and Lightweight Retrieval-Centric Generative AI ToolCode2
Drive Like a Human: Rethinking Autonomous Driving with Large Language ModelsCode2
Exposing flaws of generative model evaluation metrics and their unfair treatment of diffusion modelsCode2
Causal Reasoning and Large Language Models: Opening a New Frontier for CausalityCode2
DS-1000: A Natural and Reliable Benchmark for Data Science Code GenerationCode2
Show:102550
← PrevPage 3 of 109Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PaLM-540B (few-shot, k=5)Accuracy95.4Unverified
2Gopher-280B (few-shot, k=5)Accuracy80Unverified
3PaLM-62B (few-shot, k=5)Accuracy77.7Unverified