SOTAVerified

Benchmarking

Papers

Showing 23812390 of 5548 papers

TitleStatusHype
Are Synthetic Corruptions A Reliable Proxy For Real-World Corruptions?Code0
DrugOOD: Out-of-Distribution (OOD) Dataset Curator and Benchmark for AI-aided Drug Discovery -- A Focus on Affinity Prediction Problems with Noise AnnotationsCode0
Benchmarking Learning Efficiency in Deep Reservoir ComputingCode0
Flexible Generation of Preference Data for Recommendation AnalysisCode0
Benchmarking Neural Machine Translation for Southern African LanguagesCode0
IOLBENCH: Benchmarking LLMs on Linguistic ReasoningCode0
Geological Inference from Textual Data using Word EmbeddingsCode0
Generative Models for Fast Simulation of Cherenkov Detectors at the Electron-Ion ColliderCode0
DQI: Measuring Data Quality in NLPCode0
Benchmarking Large Vision-Language Models on Fine-Grained Image Tasks: A Comprehensive EvaluationCode0
Show:102550
← PrevPage 239 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified