Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1451–1500 of 5548 papers

Title	Date	Tasks	Status	Hype
Ground Truth Evaluation of Neural Network Explanations with CLEVR-XAI	Mar 16, 2020	BenchmarkingExplainable Artificial Intelligence (XAI)	CodeCode Available	1
DNN+NeuroSim V2.0: An End-to-End Benchmarking Framework for Compute-in-Memory Accelerators for On-chip Training	Mar 13, 2020	BenchmarkingQuantization	CodeCode Available	1
AirSim Drone Racing Lab	Mar 12, 2020	BenchmarkingOptical Flow Estimation	CodeCode Available	1
A Benchmarking Study of Embedding-based Entity Alignment for Knowledge Graphs	Mar 10, 2020	BenchmarkingEntity Alignment	CodeCode Available	1
Benchmarking TinyML Systems: Challenges and Direction	Mar 10, 2020	BenchmarkingPosition	CodeCode Available	1
Benchmarking MRI Reconstruction Neural Networks on Large Public Datasets	Mar 6, 2020	BenchmarkingImage Reconstruction	CodeCode Available	1
Rethinking Zero-shot Video Classification: End-to-end Training for Realistic Applications	Mar 3, 2020	BenchmarkingGeneral Classification	CodeCode Available	1
Image Matching across Wide Baselines: From Paper to Practice	Mar 3, 2020	Benchmarking	CodeCode Available	1
End-to-end Emotion-Cause Pair Extraction via Learning to Link	Feb 25, 2020	BenchmarkingEmotion Cause Extraction	CodeCode Available	1
Single-cell entropy to quantify the cellular transcription from single-cell RNA-seq data	Feb 15, 2020	BenchmarkingClassification	CodeCode Available	1
NAS-Bench-1Shot1: Benchmarking and Dissecting One-shot Neural Architecture Search	Jan 28, 2020	BenchmarkingNeural Architecture Search	CodeCode Available	1
CodeReef: an open platform for portable MLOps, reusable automation actions and reproducible benchmarking	Jan 22, 2020	Benchmarkingobject-detection	CodeCode Available	1
An Exploration of Embodied Visual Exploration	Jan 7, 2020	Benchmarking	CodeCode Available	1
Coursera Corpus Mining and Multistage Fine-Tuning for Improving Lectures Translation	Dec 26, 2019	BenchmarkingDomain Adaptation	CodeCode Available	1
Automatic Detection of Generated Text is Easiest when Humans are Fooled	Nov 2, 2019	BenchmarkingLanguage Modelling	CodeCode Available	1
Word-level Deep Sign Language Recognition from Video: A New Large-scale Dataset and Methods Comparison	Oct 24, 2019	Action ClassificationBenchmarking	CodeCode Available	1
Torchreid: A Library for Deep Learning Person Re-Identification in Pytorch	Oct 22, 2019	BenchmarkingPerson Re-Identification	CodeCode Available	1
Benchmarking Batch Deep Reinforcement Learning Algorithms	Oct 3, 2019	BenchmarkingDeep Reinforcement Learning	CodeCode Available	1
Benchmarking machine learning models on multi-centre eICU critical care dataset	Oct 2, 2019	BenchmarkingBIG-bench Machine Learning	CodeCode Available	1
An Evaluation Dataset for Intent Classification and Out-of-Scope Prediction	Sep 4, 2019	BenchmarkingGeneral Classification	CodeCode Available	1
miniSAM: A Flexible Factor Graph Non-linear Least Squares Optimization Framework	Sep 3, 2019	BenchmarkingMotion Planning	CodeCode Available	1
Demystifying Learning Rate Policies for High Accuracy Training of Deep Neural Networks	Aug 18, 2019	BenchmarkingImage Classification	CodeCode Available	1
SpatialSense: An Adversarially Crowdsourced Benchmark for Spatial Relation Recognition	Aug 7, 2019	BenchmarkingRelation	CodeCode Available	1
PyRobot: An Open-source Robotics Framework for Research and Benchmarking	Jun 19, 2019	BenchmarkingRobotic Grasping	CodeCode Available	1
MMDetection: Open MMLab Detection Toolbox and Benchmark	Jun 17, 2019	BenchmarkingInstance Segmentation	CodeCode Available	1
Transfer Learning in Biomedical Natural Language Processing: An Evaluation of BERT and ELMo on Ten Benchmarking Datasets	Jun 13, 2019	BenchmarkingDocument Classification	CodeCode Available	1
MNIST-C: A Robustness Benchmark for Computer Vision	Jun 5, 2019	Adversarial RobustnessBenchmarking	CodeCode Available	1
Meta-Surrogate Benchmarking for Hyperparameter Optimization	May 30, 2019	BenchmarkingHyperparameter Optimization	CodeCode Available	1
Benchmarking Regression Methods: A comparison with CGAN	May 30, 2019	BenchmarkingInductive Learning	CodeCode Available	1
COCO: The Large Scale Black-Box Optimization Benchmarking (bbob-largescale) Test Suite	Mar 15, 2019	Benchmarking	CodeCode Available	1
Benchmarking Natural Language Understanding Services for building Conversational Agents	Mar 13, 2019	BenchmarkingGeneral Classification	CodeCode Available	1
NAS-Bench-101: Towards Reproducible Neural Architecture Search	Feb 25, 2019	BenchmarkingNeural Architecture Search	CodeCode Available	1
The StarCraft Multi-Agent Challenge	Feb 11, 2019	BenchmarkingMuJoCo	CodeCode Available	1
The Liver Tumor Segmentation Benchmark (LiTS)	Jan 13, 2019	BenchmarkingComputed Tomography (CT)	CodeCode Available	1
LEAF: A Benchmark for Federated Settings	Dec 3, 2018	Autonomous VehiclesBenchmarking	CodeCode Available	1
GuacaMol: Benchmarking Models for De Novo Molecular Design	Nov 22, 2018	BenchmarkingDrug Discovery	CodeCode Available	1
IOHprofiler: A Benchmarking and Profiling Tool for Iterative Optimization Heuristics	Oct 11, 2018	Benchmarking	CodeCode Available	1
On Evaluation of Embodied Navigation Agents	Jul 18, 2018	Benchmarking	CodeCode Available	1
Benchmarking Neural Network Robustness to Common Corruptions and Surface Variations	Jul 4, 2018	Adversarial DefenseBenchmarking	CodeCode Available	1
Texygen: A Benchmarking Platform for Text Generation Models	Feb 6, 2018	BenchmarkingDiversity	CodeCode Available	1
Benchmarking Relief-Based Feature Selection Methods for Bioinformatics Data Mining	Nov 22, 2017	Benchmarkingfeature selection	CodeCode Available	1
Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms	Aug 25, 2017	BenchmarkingBIG-bench Machine Learning	CodeCode Available	1
featsel: A framework for benchmarking of feature selection algorithms and cost functions	Jul 19, 2017	BenchmarkingComputational Efficiency	CodeCode Available	1
Multitask learning and benchmarking with clinical time series data	Mar 22, 2017	BenchmarkingBIG-bench Machine Learning	CodeCode Available	1
MS MARCO: A Human Generated MAchine Reading COmprehension Dataset	Nov 28, 2016	BenchmarkingMachine Reading Comprehension	CodeCode Available	1
CIDEr: Consensus-based Image Description Evaluation	Nov 20, 2014	Action RecognitionAttribute	CodeCode Available	1
Building a Scalable and Interpretable Bayesian Deep Learning Framework for Quality Control of Free Form Surfaces	Apr 7, 1994	Active LearningBenchmarking	CodeCode Available	1
Visual Place Recognition for Large-Scale UAV Applications	Jul 20, 2025	BenchmarkingDiversity	—Unverified	0
Training Transformers with Enforced Lipschitz Constants	Jul 17, 2025	Benchmarking	—Unverified	0
MUPAX: Multidimensional Problem Agnostic eXplainable AI	Jul 17, 2025	Anatomical Landmark DetectionAudio Classification	—Unverified	0

Show:10 25 50

← PrevPage 30 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified