SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1931–1940 of 5548 papers

Title	Date	Tasks	Status	Hype
Benchmarking Predictive Coding Networks -- Made Simple	Jul 1, 2024	Benchmarking	CodeCode Available	2
AI Agents That Matter	Jul 1, 2024	Benchmarking	CodeCode Available	1
Overcoming Common Flaws in the Evaluation of Selective Classification Systems	Jul 1, 2024	BenchmarkingClassification	CodeCode Available	1
Commute Graph Neural Networks	Jun 30, 2024	Benchmarking	—Unverified	0
GenderBias-VL: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing	Jun 30, 2024	Benchmarkingcounterfactual	—Unverified	0
PerSEval: Assessing Personalization in Text Summarizers	Jun 29, 2024	BenchmarkingHuman Judgment Correlation	—Unverified	0
GraphArena: Benchmarking Large Language Models on Graph Computational Problems	Jun 29, 2024	BenchmarkingHallucination	CodeCode Available	1
iAMPCN: a deep-learning approach for identifying antimicrobial peptides and their functional activities	Jun 27, 2024	Benchmarking	CodeCode Available	1
Generative AI for Synthetic Data Across Multiple Medical Modalities: A Systematic Review of Recent Developments and Challenges	Jun 27, 2024	BenchmarkingClinical Knowledge	—Unverified	0
Benchmarking M6 Competitors: An Analysis of Financial Metrics and Discussion of Incentives	Jun 27, 2024	Benchmarking	—Unverified	0

Show:10 25 50

← PrevPage 194 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified