Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1476–1500 of 5548 papers

Title	Date	Tasks	Status	Hype
Transfer Learning in Biomedical Natural Language Processing: An Evaluation of BERT and ELMo on Ten Benchmarking Datasets	Jun 13, 2019	BenchmarkingDocument Classification	CodeCode Available	1
MNIST-C: A Robustness Benchmark for Computer Vision	Jun 5, 2019	Adversarial RobustnessBenchmarking	CodeCode Available	1
Meta-Surrogate Benchmarking for Hyperparameter Optimization	May 30, 2019	BenchmarkingHyperparameter Optimization	CodeCode Available	1
Benchmarking Regression Methods: A comparison with CGAN	May 30, 2019	BenchmarkingInductive Learning	CodeCode Available	1
COCO: The Large Scale Black-Box Optimization Benchmarking (bbob-largescale) Test Suite	Mar 15, 2019	Benchmarking	CodeCode Available	1
Benchmarking Natural Language Understanding Services for building Conversational Agents	Mar 13, 2019	BenchmarkingGeneral Classification	CodeCode Available	1
NAS-Bench-101: Towards Reproducible Neural Architecture Search	Feb 25, 2019	BenchmarkingNeural Architecture Search	CodeCode Available	1
The StarCraft Multi-Agent Challenge	Feb 11, 2019	BenchmarkingMuJoCo	CodeCode Available	1
The Liver Tumor Segmentation Benchmark (LiTS)	Jan 13, 2019	BenchmarkingComputed Tomography (CT)	CodeCode Available	1
LEAF: A Benchmark for Federated Settings	Dec 3, 2018	Autonomous VehiclesBenchmarking	CodeCode Available	1
GuacaMol: Benchmarking Models for De Novo Molecular Design	Nov 22, 2018	BenchmarkingDrug Discovery	CodeCode Available	1
IOHprofiler: A Benchmarking and Profiling Tool for Iterative Optimization Heuristics	Oct 11, 2018	Benchmarking	CodeCode Available	1
On Evaluation of Embodied Navigation Agents	Jul 18, 2018	Benchmarking	CodeCode Available	1
Benchmarking Neural Network Robustness to Common Corruptions and Surface Variations	Jul 4, 2018	Adversarial DefenseBenchmarking	CodeCode Available	1
Texygen: A Benchmarking Platform for Text Generation Models	Feb 6, 2018	BenchmarkingDiversity	CodeCode Available	1
Benchmarking Relief-Based Feature Selection Methods for Bioinformatics Data Mining	Nov 22, 2017	Benchmarkingfeature selection	CodeCode Available	1
Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms	Aug 25, 2017	BenchmarkingBIG-bench Machine Learning	CodeCode Available	1
featsel: A framework for benchmarking of feature selection algorithms and cost functions	Jul 19, 2017	BenchmarkingComputational Efficiency	CodeCode Available	1
Multitask learning and benchmarking with clinical time series data	Mar 22, 2017	BenchmarkingBIG-bench Machine Learning	CodeCode Available	1
MS MARCO: A Human Generated MAchine Reading COmprehension Dataset	Nov 28, 2016	BenchmarkingMachine Reading Comprehension	CodeCode Available	1
CIDEr: Consensus-based Image Description Evaluation	Nov 20, 2014	Action RecognitionAttribute	CodeCode Available	1
Building a Scalable and Interpretable Bayesian Deep Learning Framework for Quality Control of Free Form Surfaces	Apr 7, 1994	Active LearningBenchmarking	CodeCode Available	1
Visual Place Recognition for Large-Scale UAV Applications	Jul 20, 2025	BenchmarkingDiversity	—Unverified	0
Training Transformers with Enforced Lipschitz Constants	Jul 17, 2025	Benchmarking	—Unverified	0
MUPAX: Multidimensional Problem Agnostic eXplainable AI	Jul 17, 2025	Anatomical Landmark DetectionAudio Classification	—Unverified	0

Show:10 25 50

← PrevPage 60 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified