SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 471–480 of 5548 papers

Title	Date	Tasks	Status	Hype
A Neuro-Symbolic Framework for Sequence Classification with Relational and Temporal Knowledge	May 8, 2025	Benchmarking	CodeCode Available	0
Software Development Life Cycle Perspective: A Survey of Benchmarks for Code Large Language Models and Agents	May 8, 2025	Benchmarking	—Unverified	0
Benchmarking Ophthalmology Foundation Models for Clinically Significant Age Macular Degeneration Detection	May 8, 2025	BenchmarkingOut-of-Distribution Generalization	—Unverified	0
DispBench: Benchmarking Disparity Estimation to Synthetic Corruptions	May 8, 2025	Autonomous NavigationBenchmarking	CodeCode Available	0
Benchmarking Traditional Machine Learning and Deep Learning Models for Fault Detection in Power Transformers	May 7, 2025	BenchmarkingFault Detection	CodeCode Available	0
Are Synthetic Corruptions A Reliable Proxy For Real-World Corruptions?	May 7, 2025	BenchmarkingSemantic Segmentation	CodeCode Available	0
False Promises in Medical Imaging AI? Assessing Validity of Outperformance Claims	May 7, 2025	Benchmarking	CodeCode Available	0
Benchmarking LLM Faithfulness in RAG with Evolving Leaderboards	May 7, 2025	BenchmarkingHallucination	CodeCode Available	1
Advancing and Benchmarking Personalized Tool Invocation for LLMs	May 7, 2025	BenchmarkingWorld Knowledge	CodeCode Available	0
Benchmarking LLMs' Swarm intelligence	May 7, 2025	Benchmarking	CodeCode Available	1

Show:10 25 50

← PrevPage 48 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified