SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3601–3610 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
Benchmarking Histopathology Foundation Models for Ovarian Cancer Bevacizumab Treatment Response Prediction from Whole Slide Images	Jul 30, 2024	BenchmarkingMultiple Instance Learning	—Unverified	0	0
Benchmarking high-fidelity pedestrian tracking systems for research, real-time monitoring and crowd control	Aug 26, 2021	BenchmarkingDensity Estimation	—Unverified	0	0
What Emotions Make One or Five Stars? Understanding Ratings of Online Product Reviews by Sentiment Analysis and XAI	Feb 29, 2020	BenchmarkingBIG-bench Machine Learning	—Unverified	0	0
Benchmarking Hierarchical Image Pyramid Transformer for the classification of colon biopsies and polyps in histopathology images	May 24, 2024	BenchmarkingClassification	—Unverified	0	0
ADCB: An Alzheimer's disease benchmark for evaluating observational estimators of causal effects	Nov 12, 2021	BenchmarkingCausal Inference	—Unverified	0	0
MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems	May 16, 2025	BenchmarkingMixture-of-Experts	—Unverified	0	0
MIRAI: Evaluating LLM Agents for Event Forecasting	Jul 1, 2024	ArticlesBenchmarking	—Unverified	0	0
MIR-Bench: Can Your LLM Recognize Complicated Patterns via Many-Shot In-Context Reasoning?	Feb 14, 2025	BenchmarkingIn-Context Learning	—Unverified	0	0
Benchmarking Heterogeneous Treatment Effect Models through the Lens of Interpretability	Jun 16, 2022	BenchmarkingFeature Importance	—Unverified	0	0
Towards Large Language Models that Benefit for All: Benchmarking Group Fairness in Reward Models	Mar 10, 2025	AllBenchmarking	—Unverified	0	0

Show:10 25 50

← PrevPage 361 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified