SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 651–660 of 5548 papers

Title	Date	Tasks	Status	Hype
LCA-on-the-Line: Benchmarking Out-of-Distribution Generalization with Class Taxonomies	Jul 22, 2024	BenchmarkingOut-of-Distribution Generalization	CodeCode Available	1
POGEMA: A Benchmark Platform for Cooperative Multi-Agent Pathfinding	Jul 20, 2024	BenchmarkingHeuristic Search	CodeCode Available	1
Thinking Racial Bias in Fair Forgery Detection: Models, Datasets and Evaluations	Jul 19, 2024	BenchmarkingFairness	CodeCode Available	1
Restore Anything Model via Efficient Degradation Adaptation	Jul 18, 2024	5-Degradation Blind All-in-One Image RestorationBenchmarking	CodeCode Available	1
SKADA-Bench: Benchmarking Unsupervised Domain Adaptation Methods with Realistic Validation On Diverse Modalities	Jul 16, 2024	BenchmarkingDomain Adaptation	CodeCode Available	1
Beyond Correctness: Benchmarking Multi-dimensional Code Generation for Large Language Models	Jul 16, 2024	BenchmarkingCode Generation	CodeCode Available	1
Separable Operator Networks	Jul 15, 2024	BenchmarkingGPU	CodeCode Available	1
When Heterophily Meets Heterogeneity: Challenges and a New Large-Scale Graph Benchmark	Jul 15, 2024	BenchmarkingGraph Learning	CodeCode Available	1
CIBench: Evaluating Your LLMs with a Code Interpreter Plugin	Jul 15, 2024	Benchmarking	CodeCode Available	1
OptiBench Meets ReSocratic: Measure and Improve LLMs for Optimization Modeling	Jul 13, 2024	BenchmarkingMath	CodeCode Available	1

Show:10 25 50

← PrevPage 66 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified