SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 391–400 of 5548 papers

Title	Date	Tasks	Status	Hype
LOVE: Benchmarking and Evaluating Text-to-Video Generation and Video-to-Text Interpretation	May 17, 2025	BenchmarkingQuestion Answering	CodeCode Available	1
GLOVER++: Unleashing the Potential of Affordance Learning from Human Behaviors for Robotic Manipulation	May 17, 2025	Benchmarking	—Unverified	0
SoftPQ: Robust Instance Segmentation Evaluation via Soft Matching and Tunable Thresholds	May 17, 2025	BenchmarkingBinary Classification	CodeCode Available	0
Benchmarking Spatiotemporal Reasoning in LLMs and Reasoning Models: Capabilities and Challenges	May 16, 2025	BenchmarkingState Estimation	CodeCode Available	0
Benchmarking CFAR and CNN-based Peak Detection Algorithms in ISAC under Hardware Impairments	May 16, 2025	BenchmarkingIntegrated sensing and communication	—Unverified	0
Can AI Freelancers Compete? Benchmarking Earnings, Reliability, and Task Success at Scale	May 16, 2025	BenchmarkingTAG	—Unverified	0
ASR-FAIRBENCH: Measuring and Benchmarking Equity Across Speech Recognition Systems	May 16, 2025	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
MedGUIDE: Benchmarking Clinical Decision-Making in Large Language Models	May 16, 2025	BenchmarkingDecision Making	—Unverified	0
HumaniBench: A Human-Centric Framework for Large Multimodal Models Evaluation	May 16, 2025	BenchmarkingEthics	CodeCode Available	0
MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems	May 16, 2025	BenchmarkingMixture-of-Experts	—Unverified	0

Show:10 25 50

← PrevPage 40 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified