SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1811–1820 of 5548 papers

Title	Date	Tasks	Status	Hype
GLOVER++: Unleashing the Potential of Affordance Learning from Human Behaviors for Robotic Manipulation	May 17, 2025	Benchmarking	—Unverified	0
SoftPQ: Robust Instance Segmentation Evaluation via Soft Matching and Tunable Thresholds	May 17, 2025	BenchmarkingBinary Classification	CodeCode Available	0
HumaniBench: A Human-Centric Framework for Large Multimodal Models Evaluation	May 16, 2025	BenchmarkingEthics	CodeCode Available	0
MedGUIDE: Benchmarking Clinical Decision-Making in Large Language Models	May 16, 2025	BenchmarkingDecision Making	—Unverified	0
MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems	May 16, 2025	BenchmarkingMixture-of-Experts	—Unverified	0
GuideBench: Benchmarking Domain-Oriented Guideline Following for LLM Agents	May 16, 2025	BenchmarkingInstruction Following	—Unverified	0
Benchmarking CFAR and CNN-based Peak Detection Algorithms in ISAC under Hardware Impairments	May 16, 2025	BenchmarkingIntegrated sensing and communication	—Unverified	0
Benchmarking Critical Questions Generation: A Challenging Reasoning Task for Large Language Models	May 16, 2025	Benchmarking	—Unverified	0
Audio Turing Test: Benchmarking the Human-likeness of Large Language Model-based Text-to-Speech Systems in Chinese	May 16, 2025	BenchmarkingLanguage Modeling	—Unverified	0
VitaGraph: Building a Knowledge Graph for Biologically Relevant Learning Tasks	May 16, 2025	BenchmarkingLink Prediction	CodeCode Available	0

Show:10 25 50

← PrevPage 182 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified