SOTAVerified

Benchmarking

Papers

Showing 19311940 of 5548 papers

TitleStatusHype
Benchmarking Predictive Coding Networks -- Made SimpleCode2
AI Agents That MatterCode1
Overcoming Common Flaws in the Evaluation of Selective Classification SystemsCode1
Commute Graph Neural Networks0
GenderBias-VL: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing0
PerSEval: Assessing Personalization in Text Summarizers0
GraphArena: Benchmarking Large Language Models on Graph Computational ProblemsCode1
iAMPCN: a deep-learning approach for identifying antimicrobial peptides and their functional activitiesCode1
Generative AI for Synthetic Data Across Multiple Medical Modalities: A Systematic Review of Recent Developments and Challenges0
Benchmarking M6 Competitors: An Analysis of Financial Metrics and Discussion of Incentives0
Show:102550
← PrevPage 194 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified