SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1871–1880 of 5548 papers

Title	Date	Tasks	Status	Hype
A Neuro-Symbolic Framework for Sequence Classification with Relational and Temporal Knowledge	May 8, 2025	Benchmarking	CodeCode Available	0
Autoregressive Stochastic Clock Jitter Compensation in Analog-to-Digital Converters	May 8, 2025	Benchmarking	—Unverified	0
QualBench: Benchmarking Chinese LLMs with Localized Professional Qualifications for Vertical Domain Evaluation	May 8, 2025	BenchmarkingFederated Learning	—Unverified	0
Enhancing Treatment Effect Estimation via Active Learning: A Counterfactual Covering Perspective	May 8, 2025	Active LearningBenchmarking	CodeCode Available	0
Benchmarking Ophthalmology Foundation Models for Clinically Significant Age Macular Degeneration Detection	May 8, 2025	BenchmarkingOut-of-Distribution Generalization	—Unverified	0
Software Development Life Cycle Perspective: A Survey of Benchmarks for Code Large Language Models and Agents	May 8, 2025	Benchmarking	—Unverified	0
Advancing and Benchmarking Personalized Tool Invocation for LLMs	May 7, 2025	BenchmarkingWorld Knowledge	CodeCode Available	0
Alpha Excel Benchmark	May 7, 2025	Benchmarking	—Unverified	0
Benchmarking Traditional Machine Learning and Deep Learning Models for Fault Detection in Power Transformers	May 7, 2025	BenchmarkingFault Detection	CodeCode Available	0
False Promises in Medical Imaging AI? Assessing Validity of Outperformance Claims	May 7, 2025	Benchmarking	CodeCode Available	0

Show:10 25 50

← PrevPage 188 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified