SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1681–1690 of 5548 papers

Title	Date	Tasks	Status	Hype
Benchmarking Clinical Decision Support Search	Jan 29, 2018	ArticlesBenchmarking	—Unverified	0
Ad-hoc Concept Forming in the Game Codenames as a Means for Evaluating Large Language Models	Feb 17, 2025	Benchmarking	—Unverified	0
Benchmarking Classical, Deep, and Generative Models for Human Activity Recognition	Jan 14, 2025	Activity RecognitionBenchmarking	—Unverified	0
An Experimental Study: Assessing the Combined Framework of WavLM and BEST-RQ for Text-to-Speech Synthesis	Dec 8, 2023	BenchmarkingQuantization	—Unverified	0
Benchmarking Chinese Medical LLMs: A Medbench-based Analysis of Performance Gaps and Hierarchical Optimization Strategies	Mar 10, 2025	BenchmarkingEthics	—Unverified	0
A Comprehensive Library for Benchmarking Multi-class Visual Anomaly Detection	Jun 5, 2024	Anomaly DetectionBenchmarking	—Unverified	0
ABOUT ML: Annotation and Benchmarking on Understanding and Transparency of Machine Learning Lifecycles	Dec 12, 2019	BenchmarkingBIG-bench Machine Learning	—Unverified	0
Demographic Parity: Mitigating Biases in Real-World Data	Sep 27, 2023	Benchmarking	—Unverified	0
CKnowEdit: A New Chinese Knowledge Editing Dataset for Linguistics, Facts, and Logic Error Correction in LLMs	Sep 9, 2024	Benchmarkingknowledge editing	—Unverified	0
A New Stereo Benchmarking Dataset for Satellite Images	Jul 9, 2019	Benchmarking	—Unverified	0

Show:10 25 50

← PrevPage 169 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified