SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3161–3170 of 5548 papers

Title	Date	Tasks	Status	Hype
Evaluation of Popular XAI Applied to Clinical Prediction Models: Can They be Trusted?	Jun 21, 2023	BenchmarkingExplainable artificial intelligence	—Unverified	0
A Comprehensive Study on the Robustness of Image Classification and Object Detection in Remote Sensing: Surveying and Benchmarking	Jun 21, 2023	Adversarial RobustnessBenchmarking	—Unverified	0
IMP-MARL: a Suite of Environments for Large-scale Infrastructure Management Planning via MARL	Jun 20, 2023	BenchmarkingManagement	CodeCode Available	1
Diverse Community Data for Benchmarking Data Privacy Algorithms	Jun 20, 2023	Benchmarking	—Unverified	0
Geometric Deep Learning for Structure-Based Drug Design: A Survey	Jun 20, 2023	BenchmarkingDeep Learning	CodeCode Available	1
Did the Models Understand Documents? Benchmarking Models for Language Understanding in Document-Level Relation Extraction	Jun 20, 2023	BenchmarkingDocument-level Relation Extraction	CodeCode Available	0
Beyond Normal: On the Evaluation of Mutual Information Estimators	Jun 19, 2023	BenchmarkingDomain Generalization	CodeCode Available	1
causalAssembly: Generating Realistic Production Data for Benchmarking Causal Discovery	Jun 19, 2023	BenchmarkingCausal Discovery	CodeCode Available	1
OpenP5: An Open-Source Platform for Developing, Training, and Evaluating LLM-based Recommender Systems	Jun 19, 2023	BenchmarkingDecoder	CodeCode Available	2
Benchmarking Robustness of Deep Reinforcement Learning approaches to Online Portfolio Management	Jun 19, 2023	BenchmarkingDeep Reinforcement Learning	—Unverified	0

Show:10 25 50

← PrevPage 317 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified