| Ground Truth Evaluation of Neural Network Explanations with CLEVR-XAI | Mar 16, 2020 | BenchmarkingExplainable Artificial Intelligence (XAI) | CodeCode Available | 1 |
| DNN+NeuroSim V2.0: An End-to-End Benchmarking Framework for Compute-in-Memory Accelerators for On-chip Training | Mar 13, 2020 | BenchmarkingQuantization | CodeCode Available | 1 |
| AirSim Drone Racing Lab | Mar 12, 2020 | BenchmarkingOptical Flow Estimation | CodeCode Available | 1 |
| A Benchmarking Study of Embedding-based Entity Alignment for Knowledge Graphs | Mar 10, 2020 | BenchmarkingEntity Alignment | CodeCode Available | 1 |
| Benchmarking TinyML Systems: Challenges and Direction | Mar 10, 2020 | BenchmarkingPosition | CodeCode Available | 1 |
| Benchmarking MRI Reconstruction Neural Networks on Large Public Datasets | Mar 6, 2020 | BenchmarkingImage Reconstruction | CodeCode Available | 1 |
| Rethinking Zero-shot Video Classification: End-to-end Training for Realistic Applications | Mar 3, 2020 | BenchmarkingGeneral Classification | CodeCode Available | 1 |
| Image Matching across Wide Baselines: From Paper to Practice | Mar 3, 2020 | Benchmarking | CodeCode Available | 1 |
| End-to-end Emotion-Cause Pair Extraction via Learning to Link | Feb 25, 2020 | BenchmarkingEmotion Cause Extraction | CodeCode Available | 1 |
| Single-cell entropy to quantify the cellular transcription from single-cell RNA-seq data | Feb 15, 2020 | BenchmarkingClassification | CodeCode Available | 1 |
| NAS-Bench-1Shot1: Benchmarking and Dissecting One-shot Neural Architecture Search | Jan 28, 2020 | BenchmarkingNeural Architecture Search | CodeCode Available | 1 |
| CodeReef: an open platform for portable MLOps, reusable automation actions and reproducible benchmarking | Jan 22, 2020 | Benchmarkingobject-detection | CodeCode Available | 1 |
| An Exploration of Embodied Visual Exploration | Jan 7, 2020 | Benchmarking | CodeCode Available | 1 |
| Coursera Corpus Mining and Multistage Fine-Tuning for Improving Lectures Translation | Dec 26, 2019 | BenchmarkingDomain Adaptation | CodeCode Available | 1 |
| Automatic Detection of Generated Text is Easiest when Humans are Fooled | Nov 2, 2019 | BenchmarkingLanguage Modelling | CodeCode Available | 1 |
| Word-level Deep Sign Language Recognition from Video: A New Large-scale Dataset and Methods Comparison | Oct 24, 2019 | Action ClassificationBenchmarking | CodeCode Available | 1 |
| Torchreid: A Library for Deep Learning Person Re-Identification in Pytorch | Oct 22, 2019 | BenchmarkingPerson Re-Identification | CodeCode Available | 1 |
| Benchmarking Batch Deep Reinforcement Learning Algorithms | Oct 3, 2019 | BenchmarkingDeep Reinforcement Learning | CodeCode Available | 1 |
| Benchmarking machine learning models on multi-centre eICU critical care dataset | Oct 2, 2019 | BenchmarkingBIG-bench Machine Learning | CodeCode Available | 1 |
| An Evaluation Dataset for Intent Classification and Out-of-Scope Prediction | Sep 4, 2019 | BenchmarkingGeneral Classification | CodeCode Available | 1 |
| miniSAM: A Flexible Factor Graph Non-linear Least Squares Optimization Framework | Sep 3, 2019 | BenchmarkingMotion Planning | CodeCode Available | 1 |
| Demystifying Learning Rate Policies for High Accuracy Training of Deep Neural Networks | Aug 18, 2019 | BenchmarkingImage Classification | CodeCode Available | 1 |
| SpatialSense: An Adversarially Crowdsourced Benchmark for Spatial Relation Recognition | Aug 7, 2019 | BenchmarkingRelation | CodeCode Available | 1 |
| PyRobot: An Open-source Robotics Framework for Research and Benchmarking | Jun 19, 2019 | BenchmarkingRobotic Grasping | CodeCode Available | 1 |
| MMDetection: Open MMLab Detection Toolbox and Benchmark | Jun 17, 2019 | BenchmarkingInstance Segmentation | CodeCode Available | 1 |
| Transfer Learning in Biomedical Natural Language Processing: An Evaluation of BERT and ELMo on Ten Benchmarking Datasets | Jun 13, 2019 | BenchmarkingDocument Classification | CodeCode Available | 1 |
| MNIST-C: A Robustness Benchmark for Computer Vision | Jun 5, 2019 | Adversarial RobustnessBenchmarking | CodeCode Available | 1 |
| Meta-Surrogate Benchmarking for Hyperparameter Optimization | May 30, 2019 | BenchmarkingHyperparameter Optimization | CodeCode Available | 1 |
| Benchmarking Regression Methods: A comparison with CGAN | May 30, 2019 | BenchmarkingInductive Learning | CodeCode Available | 1 |
| COCO: The Large Scale Black-Box Optimization Benchmarking (bbob-largescale) Test Suite | Mar 15, 2019 | Benchmarking | CodeCode Available | 1 |
| Benchmarking Natural Language Understanding Services for building Conversational Agents | Mar 13, 2019 | BenchmarkingGeneral Classification | CodeCode Available | 1 |
| NAS-Bench-101: Towards Reproducible Neural Architecture Search | Feb 25, 2019 | BenchmarkingNeural Architecture Search | CodeCode Available | 1 |
| The StarCraft Multi-Agent Challenge | Feb 11, 2019 | BenchmarkingMuJoCo | CodeCode Available | 1 |
| The Liver Tumor Segmentation Benchmark (LiTS) | Jan 13, 2019 | BenchmarkingComputed Tomography (CT) | CodeCode Available | 1 |
| LEAF: A Benchmark for Federated Settings | Dec 3, 2018 | Autonomous VehiclesBenchmarking | CodeCode Available | 1 |
| GuacaMol: Benchmarking Models for De Novo Molecular Design | Nov 22, 2018 | BenchmarkingDrug Discovery | CodeCode Available | 1 |
| IOHprofiler: A Benchmarking and Profiling Tool for Iterative Optimization Heuristics | Oct 11, 2018 | Benchmarking | CodeCode Available | 1 |
| On Evaluation of Embodied Navigation Agents | Jul 18, 2018 | Benchmarking | CodeCode Available | 1 |
| Benchmarking Neural Network Robustness to Common Corruptions and Surface Variations | Jul 4, 2018 | Adversarial DefenseBenchmarking | CodeCode Available | 1 |
| Texygen: A Benchmarking Platform for Text Generation Models | Feb 6, 2018 | BenchmarkingDiversity | CodeCode Available | 1 |
| Benchmarking Relief-Based Feature Selection Methods for Bioinformatics Data Mining | Nov 22, 2017 | Benchmarkingfeature selection | CodeCode Available | 1 |
| Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms | Aug 25, 2017 | BenchmarkingBIG-bench Machine Learning | CodeCode Available | 1 |
| featsel: A framework for benchmarking of feature selection algorithms and cost functions | Jul 19, 2017 | BenchmarkingComputational Efficiency | CodeCode Available | 1 |
| Multitask learning and benchmarking with clinical time series data | Mar 22, 2017 | BenchmarkingBIG-bench Machine Learning | CodeCode Available | 1 |
| MS MARCO: A Human Generated MAchine Reading COmprehension Dataset | Nov 28, 2016 | BenchmarkingMachine Reading Comprehension | CodeCode Available | 1 |
| CIDEr: Consensus-based Image Description Evaluation | Nov 20, 2014 | Action RecognitionAttribute | CodeCode Available | 1 |
| Building a Scalable and Interpretable Bayesian Deep Learning Framework for Quality Control of Free Form Surfaces | Apr 7, 1994 | Active LearningBenchmarking | CodeCode Available | 1 |
| Visual Place Recognition for Large-Scale UAV Applications | Jul 20, 2025 | BenchmarkingDiversity | —Unverified | 0 |
| Training Transformers with Enforced Lipschitz Constants | Jul 17, 2025 | Benchmarking | —Unverified | 0 |
| MUPAX: Multidimensional Problem Agnostic eXplainable AI | Jul 17, 2025 | Anatomical Landmark DetectionAudio Classification | —Unverified | 0 |