SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2781–2790 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
Beyond Self-Talk: A Communication-Centric Survey of LLM-Based Multi-Agent Systems	Feb 20, 2025	BenchmarkingDecision Making	—Unverified	0	0
The Benchmark Lottery	Jul 14, 2021	BenchmarkingBIG-bench Machine Learning	—Unverified	0	0
Global Rice Multi-Class Segmentation Dataset (RiceSEG): A Comprehensive and Diverse High-Resolution RGB-Annotated Images for the Development and Benchmarking of Rice Segmentation Algorithms	Apr 2, 2025	BenchmarkingSemantic Segmentation	—Unverified	0	0
Global Wheat Head Dataset 2021: more diversity to improve the benchmarking of wheat head localization methods	May 17, 2021	BenchmarkingDiversity	—Unverified	0	0
Beyond Monocular Deraining: Stereo Image Deraining via Semantic Understanding	Aug 1, 2020	BenchmarkingRain Removal	—Unverified	0	0
GLOVER++: Unleashing the Potential of Affordance Learning from Human Behaviors for Robotic Manipulation	May 17, 2025	Benchmarking	—Unverified	0	0
GNNBENCH: Fair and Productive Benchmarking for Single-GPU GNN System	Apr 5, 2024	BenchmarkingGPU	—Unverified	0	0
A Benchmark for Multi-speaker Anonymization	Jul 8, 2024	BenchmarkingDisentanglement	—Unverified	0	0
Beyond Monocular Deraining: Parallel Stereo Deraining Network Via Semantic Prior	May 9, 2021	BenchmarkingRain Removal	—Unverified	0	0
Beyond Metrics: A Critical Analysis of the Variability in Large Language Model Evaluation Frameworks	Jul 29, 2024	BenchmarkingLanguage Model Evaluation	—Unverified	0	0

Show:10 25 50

← PrevPage 279 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified