SOTAVerified

Benchmarking

Papers

Showing 681690 of 5548 papers

TitleStatusHype
Decentralized Arena: Towards Democratic and Scalable Automatic Evaluation of Language ModelsCode1
Depth-Driven Geometric Prompt Learning for Laparoscopic Liver Landmark DetectionCode1
Benchmarking Language Model Creativity: A Case Study on Code GenerationCode1
Benchmarking Large Language Models on CMExam -- A Comprehensive Chinese Medical Exam DatasetCode1
Descending through a Crowded Valley — Benchmarking Deep Learning OptimizersCode1
AutoDetect: Towards a Unified Framework for Automated Weakness Detection in Large Language ModelsCode1
A Comprehensive Study on Large-Scale Graph Training: Benchmarking and RethinkingCode1
Benchmarking Llama2, Mistral, Gemma and GPT for Factuality, Toxicity, Bias and Propensity for HallucinationsCode1
DFGC 2021: A DeepFake Game CompetitionCode1
Data Splits and Metrics for Method Benchmarking on Surgical Action Triplet DatasetsCode1
Show:102550
← PrevPage 69 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified