SOTAVerified

Benchmarking

Papers

Showing 31613170 of 5548 papers

TitleStatusHype
Evaluation of Popular XAI Applied to Clinical Prediction Models: Can They be Trusted?0
A Comprehensive Study on the Robustness of Image Classification and Object Detection in Remote Sensing: Surveying and Benchmarking0
IMP-MARL: a Suite of Environments for Large-scale Infrastructure Management Planning via MARLCode1
Diverse Community Data for Benchmarking Data Privacy Algorithms0
Geometric Deep Learning for Structure-Based Drug Design: A SurveyCode1
Did the Models Understand Documents? Benchmarking Models for Language Understanding in Document-Level Relation ExtractionCode0
Beyond Normal: On the Evaluation of Mutual Information EstimatorsCode1
causalAssembly: Generating Realistic Production Data for Benchmarking Causal DiscoveryCode1
OpenP5: An Open-Source Platform for Developing, Training, and Evaluating LLM-based Recommender SystemsCode2
Benchmarking Robustness of Deep Reinforcement Learning approaches to Online Portfolio Management0
Show:102550
← PrevPage 317 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified