SOTAVerified

Benchmarking

Papers

Showing 471480 of 5548 papers

TitleStatusHype
A Neuro-Symbolic Framework for Sequence Classification with Relational and Temporal KnowledgeCode0
Software Development Life Cycle Perspective: A Survey of Benchmarks for Code Large Language Models and Agents0
Benchmarking Ophthalmology Foundation Models for Clinically Significant Age Macular Degeneration Detection0
DispBench: Benchmarking Disparity Estimation to Synthetic CorruptionsCode0
Benchmarking Traditional Machine Learning and Deep Learning Models for Fault Detection in Power TransformersCode0
Are Synthetic Corruptions A Reliable Proxy For Real-World Corruptions?Code0
False Promises in Medical Imaging AI? Assessing Validity of Outperformance ClaimsCode0
Benchmarking LLM Faithfulness in RAG with Evolving LeaderboardsCode1
Advancing and Benchmarking Personalized Tool Invocation for LLMsCode0
Benchmarking LLMs' Swarm intelligenceCode1
Show:102550
← PrevPage 48 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified