SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1941–1950 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
AntiLeak-Bench: Preventing Data Contamination by Automatically Constructing Benchmarks with Updated Real-World Knowledge	Dec 18, 2024	BenchmarkingWorld Knowledge	CodeCode Available	0	5
CURATe: Benchmarking Personalised Alignment of Conversational AI Assistants	Oct 28, 2024	Benchmarking	CodeCode Available	0	5
A Modular Workflow for Performance Benchmarking of Neuronal Network Simulations	Dec 16, 2021	Benchmarking	CodeCode Available	0	5
Beyond Marginal Uncertainty: How Accurately can Bayesian Regression Models Estimate Posterior Predictive Correlations?	Nov 6, 2020	Active LearningBenchmarking	CodeCode Available	0	5
Illuminating the Diversity-Fitness Trade-Off in Black-Box Optimization	Aug 29, 2024	BenchmarkingDiversity	CodeCode Available	0	5
Illusory VQA: Benchmarking and Enhancing Multimodal Models on Visual Illusions	Dec 11, 2024	BenchmarkingQuestion Answering	CodeCode Available	0	5
Beyond Document Page Classification: Design, Datasets, and Challenges	Aug 24, 2023	BenchmarkingClassification	CodeCode Available	0	5
A Modular Benchmarking Infrastructure for High-Performance and Reproducible Deep Learning	Jan 29, 2019	BenchmarkingDeep Learning	CodeCode Available	0	5
IJCB 2022 Mobile Behavioral Biometrics Competition (MobileB2C)	Oct 6, 2022	Benchmarking	CodeCode Available	0	5
BASED: Benchmarking, Analysis, and Structural Estimation of Deblurring	May 27, 2023	BenchmarkingDeblurring	CodeCode Available	0	5

Show:10 25 50

← PrevPage 195 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified