SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2011–2020 of 5548 papers

Title	Date	Tasks	Status	Hype
MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models	Apr 4, 2025	BenchmarkingImage Generation	—Unverified	0
Quantifying Robustness: A Benchmarking Framework for Deep Learning Forecasting in Cyber-Physical Systems	Apr 4, 2025	BenchmarkingModel Selection	CodeCode Available	0
Towards a Unified Framework for Determining Conformational Ensembles of Disordered Proteins	Apr 4, 2025	Benchmarking	—Unverified	0
Do LLM Evaluators Prefer Themselves for a Reason?	Apr 4, 2025	BenchmarkingCode Generation	CodeCode Available	0
Can AI Master Construction Management (CM)? Benchmarking State-of-the-Art Large Language Models on CM Certification Exams	Apr 4, 2025	BenchmarkingManagement	—Unverified	0
Sustainable LLM Inference for Edge AI: Evaluating Quantized LLMs for Energy Efficiency, Output Accuracy, and Inference Latency	Apr 4, 2025	BenchmarkingGSM8K	—Unverified	0
Detecting Stereotypes and Anti-stereotypes the Correct Way Using Social Psychological Underpinnings	Apr 4, 2025	Benchmarking	CodeCode Available	0
Point Cloud Objective Quality: Benchmarking Features and Quality Evaluation	Apr 4, 2025	AttributeBenchmarking	—Unverified	0
Evaluating AI Recruitment Sourcing Tools by Human Preference	Apr 3, 2025	Benchmarking	CodeCode Available	0
Benchmark of Segmentation Techniques for Pelvic Fracture in CT and X-ray: Summary of the PENGWIN 2024 Challenge	Apr 3, 2025	AnatomyBenchmarking	—Unverified	0

Show:10 25 50

← PrevPage 202 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified