SOTAVerified

Benchmarking

Papers

Showing 31513175 of 5548 papers

TitleStatusHype
My Boli: Code-mixed Marathi-English Corpora, Pretrained Language Models and Evaluation Benchmarks0
MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language ModelsCode2
Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMsCode1
OptIForest: Optimal Isolation Forest for Anomaly DetectionCode0
Benchmarking and Analyzing 3D-aware Image Synthesis with a Modularized CodebaseCode1
GADBench: Revisiting and Benchmarking Supervised Graph Anomaly DetectionCode1
On-orbit model training for satellite imagery with label proportionsCode0
On Evaluation of Document Classification using RVL-CDIP0
VisoGender: A dataset for benchmarking gender bias in image-text pronoun resolutionCode1
Challenges and Opportunities in Improving Worst-Group Generalization in Presence of Spurious FeaturesCode1
Evaluation of Popular XAI Applied to Clinical Prediction Models: Can They be Trusted?0
A Comprehensive Study on the Robustness of Image Classification and Object Detection in Remote Sensing: Surveying and Benchmarking0
IMP-MARL: a Suite of Environments for Large-scale Infrastructure Management Planning via MARLCode1
Diverse Community Data for Benchmarking Data Privacy Algorithms0
Geometric Deep Learning for Structure-Based Drug Design: A SurveyCode1
Did the Models Understand Documents? Benchmarking Models for Language Understanding in Document-Level Relation ExtractionCode0
Beyond Normal: On the Evaluation of Mutual Information EstimatorsCode1
causalAssembly: Generating Realistic Production Data for Benchmarking Causal DiscoveryCode1
OpenP5: An Open-Source Platform for Developing, Training, and Evaluating LLM-based Recommender SystemsCode2
Benchmarking Robustness of Deep Reinforcement Learning approaches to Online Portfolio Management0
Fairness Index Measures to Evaluate Bias in Biometric Recognition0
Using Motif Transitions for Temporal Graph GenerationCode0
OpenDataVal: a Unified Benchmark for Data ValuationCode1
Evaluating Graph Neural Networks for Link Prediction: Current Pitfalls and New BenchmarkingCode1
Formal Covariate Benchmarking to Bound Omitted Variable Bias0
Show:102550
← PrevPage 127 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified