SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 821–830 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
Benchmarking Low-Shot Robustness to Natural Distribution Shifts	Apr 21, 2023	Benchmarking	CodeCode Available	1	5
How to Benchmark Vision Foundation Models for Semantic Segmentation?	Apr 18, 2024	BenchmarkingDecoder	CodeCode Available	1	5
Benchmarking LLMs' Swarm intelligence	May 7, 2025	Benchmarking	CodeCode Available	1	5
Dynatask: A Framework for Creating Dynamic AI Benchmark Tasks	Apr 5, 2022	Benchmarking	CodeCode Available	1	5
Benchmarking of DL Libraries and Models on Mobile Devices	Feb 14, 2022	BenchmarkingGPU	CodeCode Available	1	5
Benchmarking LLMs for Political Science: A United Nations Perspective	Feb 19, 2025	BenchmarkingDecision Making	CodeCode Available	1	5
HyFactor: Hydrogen-count labelled graph-based defactorization Autoencoder	Dec 6, 2021	BenchmarkingGraph Learning	CodeCode Available	1	5
Earnings-22: A Practical Benchmark for Accents in the Wild	Mar 29, 2022	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	CodeCode Available	1	5
EBES: Easy Benchmarking for Event Sequences	Oct 4, 2024	Benchmarking	CodeCode Available	1	5
A Survey on Graph Counterfactual Explanations: Definitions, Methods, Evaluation, and Research Challenges	Oct 21, 2022	BenchmarkingCommunity Detection	CodeCode Available	1	5

Show:10 25 50

← PrevPage 83 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified