SOTAVerified

Benchmarking

Papers

Showing 31513160 of 5548 papers

TitleStatusHype
My Boli: Code-mixed Marathi-English Corpora, Pretrained Language Models and Evaluation Benchmarks0
MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language ModelsCode2
Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMsCode1
OptIForest: Optimal Isolation Forest for Anomaly DetectionCode0
Benchmarking and Analyzing 3D-aware Image Synthesis with a Modularized CodebaseCode1
GADBench: Revisiting and Benchmarking Supervised Graph Anomaly DetectionCode1
On-orbit model training for satellite imagery with label proportionsCode0
On Evaluation of Document Classification using RVL-CDIP0
VisoGender: A dataset for benchmarking gender bias in image-text pronoun resolutionCode1
Challenges and Opportunities in Improving Worst-Group Generalization in Presence of Spurious FeaturesCode1
Show:102550
← PrevPage 316 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified