SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 411–420 of 5548 papers

Title	Date	Tasks	Status	Hype
Relation Extraction Across Entire Books to Reconstruct Community Networks: The AffilKG Datasets	May 16, 2025	BenchmarkingKnowledge Graphs	—Unverified	0
TCC-Bench: Benchmarking the Traditional Chinese Culture Understanding Capabilities of MLLMs	May 16, 2025	BenchmarkingQuestion Answering	CodeCode Available	0
M4-SAR: A Multi-Resolution, Multi-Polarization, Multi-Scene, Multi-Source Dataset and Benchmark for Optical-SAR Fusion Object Detection	May 16, 2025	Benchmarkingobject-detection	CodeCode Available	1
Benchmarking performance, explainability, and evaluation strategies of vision-language models for surgery: Challenges and opportunities	May 16, 2025	Benchmarking	—Unverified	0
Words That Unite The World: A Unified Framework for Deciphering Central Bank Communications Globally	May 15, 2025	BenchmarkingSentence	CodeCode Available	1
Model Performance-Guided Evaluation Data Selection for Effective Prompt Optimization	May 15, 2025	BenchmarkingClustering	—Unverified	0
GNN-Suite: a Graph Neural Network Benchmarking Framework for Biomedical Informatics	May 15, 2025	BenchmarkingGraph Neural Network	CodeCode Available	0
On the Evaluation of Engineering Artificial General Intelligence	May 15, 2025	Benchmarking	—Unverified	0
Real-World fNIRS-Based Brain-Computer Interfaces: Benchmarking Deep Learning and Classical Models in Interactive Gaming	May 15, 2025	BenchmarkingData Augmentation	—Unverified	0
DIF: A Framework for Benchmarking and Verifying Implicit Bias in LLMs	May 15, 2025	BenchmarkingFairness	—Unverified	0

Show:10 25 50

← PrevPage 42 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified