Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3601–3625 of 5548 papers

Title	Date	Tasks	Status
Benchmarking Histopathology Foundation Models for Ovarian Cancer Bevacizumab Treatment Response Prediction from Whole Slide Images	Jul 30, 2024	BenchmarkingMultiple Instance Learning	—Unverified
Benchmarking high-fidelity pedestrian tracking systems for research, real-time monitoring and crowd control	Aug 26, 2021	BenchmarkingDensity Estimation	—Unverified
What Emotions Make One or Five Stars? Understanding Ratings of Online Product Reviews by Sentiment Analysis and XAI	Feb 29, 2020	BenchmarkingBIG-bench Machine Learning	—Unverified
Benchmarking Hierarchical Image Pyramid Transformer for the classification of colon biopsies and polyps in histopathology images	May 24, 2024	BenchmarkingClassification	—Unverified
ADCB: An Alzheimer's disease benchmark for evaluating observational estimators of causal effects	Nov 12, 2021	BenchmarkingCausal Inference	—Unverified
MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems	May 16, 2025	BenchmarkingMixture-of-Experts	—Unverified
MIRAI: Evaluating LLM Agents for Event Forecasting	Jul 1, 2024	ArticlesBenchmarking	—Unverified
MIR-Bench: Can Your LLM Recognize Complicated Patterns via Many-Shot In-Context Reasoning?	Feb 14, 2025	BenchmarkingIn-Context Learning	—Unverified
Benchmarking Heterogeneous Treatment Effect Models through the Lens of Interpretability	Jun 16, 2022	BenchmarkingFeature Importance	—Unverified
Towards Large Language Models that Benefit for All: Benchmarking Group Fairness in Reward Models	Mar 10, 2025	AllBenchmarking	—Unverified
Benchmarking Hebbian learning rules for associative memory	Dec 30, 2023	Benchmarking	—Unverified
Mitigating severe over-parameterization in deep convolutional neural networks through forced feature abstraction and compression with an entropy-based heuristic	Jun 27, 2021	BenchmarkingFeature Compression	—Unverified
Mixed-Precision Quantization for Federated Learning on Resource-Constrained Heterogeneous Devices	Nov 29, 2023	BenchmarkingFederated Learning	—Unverified
A Dataset Similarity Evaluation Framework for Wireless Communications and Sensing	Dec 7, 2024	BenchmarkingDimensionality Reduction	—Unverified
Benchmarking Harmonized Tariff Schedule Classification Models	Dec 4, 2024	BenchmarkingClassification	—Unverified
MJ-VIDEO: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation	Feb 3, 2025	BenchmarkingFairness	—Unverified
Towards Large-Scale Small Object Detection: Survey and Benchmarks	Jul 28, 2022	BenchmarkingObject	—Unverified
MLAR: Multi-layer Large Language Model-based Robotic Process Automation Applicant Tracking	Jul 14, 2025	BenchmarkingLanguage Modeling	—Unverified
Towards Long-Term predictions of Turbulence using Neural Operators	Jul 25, 2023	Benchmarking	—Unverified
Benchmarking Graph Neural Networks on Link Prediction	Feb 24, 2021	BenchmarkingGraph Attention	—Unverified
MLHarness: A Scalable Benchmarking System for MLCommons	Nov 9, 2021	Benchmarking	—Unverified
Benchmarking Graph Neural Networks for Document Layout Analysis in Public Affairs	May 12, 2025	BenchmarkingDocument Layout Analysis	—Unverified
MLModelScope: A Distributed Platform for ML Model Evaluation and Benchmarking at Scale	Sep 25, 2019	Benchmarking	—Unverified
MLModelScope: A Distributed Platform for Model Evaluation and Benchmarking at Scale	Feb 19, 2020	Benchmarking	—Unverified
A Dataset for Movie Description	Jan 12, 2015	BenchmarkingDescriptive	—Unverified

Show:10 25 50

← PrevPage 145 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified