SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2381–2390 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
Are Synthetic Corruptions A Reliable Proxy For Real-World Corruptions?	May 7, 2025	BenchmarkingSemantic Segmentation	CodeCode Available	0	5
DrugOOD: Out-of-Distribution (OOD) Dataset Curator and Benchmark for AI-aided Drug Discovery -- A Focus on Affinity Prediction Problems with Noise Annotations	Jan 24, 2022	BenchmarkingDrug Discovery	CodeCode Available	0	5
Benchmarking Learning Efficiency in Deep Reservoir Computing	Sep 29, 2022	Benchmarking	CodeCode Available	0	5
Flexible Generation of Preference Data for Recommendation Analysis	Jul 23, 2024	BenchmarkingRecommendation Systems	CodeCode Available	0	5
Benchmarking Neural Machine Translation for Southern African Languages	Jun 17, 2019	BenchmarkingMachine Translation	CodeCode Available	0	5
IOLBENCH: Benchmarking LLMs on Linguistic Reasoning	Jan 8, 2025	Benchmarking	CodeCode Available	0	5
Geological Inference from Textual Data using Word Embeddings	Apr 10, 2025	BenchmarkingWord Embeddings	CodeCode Available	0	5
Generative Models for Fast Simulation of Cherenkov Detectors at the Electron-Ion Collider	Apr 26, 2025	BenchmarkingGPU	CodeCode Available	0	5
DQI: Measuring Data Quality in NLP	May 2, 2020	Active LearningBenchmarking	CodeCode Available	0	5
Benchmarking Large Vision-Language Models on Fine-Grained Image Tasks: A Comprehensive Evaluation	Apr 21, 2025	Benchmarking	CodeCode Available	0	5

Show:10 25 50

← PrevPage 239 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified