SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 291–300 of 5548 papers

Title	Date	Tasks	Status	Hype
Assessing SPARQL capabilities of Large Language Models	Sep 9, 2024	BenchmarkingKnowledge Graphs	CodeCode Available	2
DrafterBench: Benchmarking Large Language Models for Tasks Automation in Civil Engineering	Jul 15, 2025	BenchmarkingInstruction Following	CodeCode Available	2
State-specific protein-ligand complex structure prediction with a multi-scale deep generative model	Sep 30, 2022	BenchmarkingBlind Docking	CodeCode Available	2
Deep Visual Geo-localization Benchmark	Apr 7, 2022	BenchmarkingData Augmentation	CodeCode Available	2
Desbordante: from benchmarking suite to high-performance science-intensive data profiler (preprint)	Jan 14, 2023	Benchmarking	CodeCode Available	2
MultiPL-E: A Scalable and Extensible Approach to Benchmarking Neural Code Generation	Aug 17, 2022	BenchmarkingCode Generation	CodeCode Available	2
EasyTPP: Towards Open Benchmarking Temporal Point Processes	Jul 16, 2023	BenchmarkingPoint Processes	CodeCode Available	2
Evaluating Large-Vocabulary Object Detectors: The Devil is in the Details	Feb 1, 2021	Benchmarkingobject-detection	CodeCode Available	2
A Survey on Multimodal Benchmarks: In the Era of Large AI Models	Sep 21, 2024	BenchmarkingSurvey	CodeCode Available	2
Fortuna: A Library for Uncertainty Quantification in Deep Learning	Feb 8, 2023	Bayesian InferenceBenchmarking	CodeCode Available	2

Show:10 25 50

← PrevPage 30 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified