| Transfer Learning in Biomedical Natural Language Processing: An Evaluation of BERT and ELMo on Ten Benchmarking Datasets | Jun 13, 2019 | BenchmarkingDocument Classification | CodeCode Available | 1 |
| MNIST-C: A Robustness Benchmark for Computer Vision | Jun 5, 2019 | Adversarial RobustnessBenchmarking | CodeCode Available | 1 |
| Meta-Surrogate Benchmarking for Hyperparameter Optimization | May 30, 2019 | BenchmarkingHyperparameter Optimization | CodeCode Available | 1 |
| Benchmarking Regression Methods: A comparison with CGAN | May 30, 2019 | BenchmarkingInductive Learning | CodeCode Available | 1 |
| COCO: The Large Scale Black-Box Optimization Benchmarking (bbob-largescale) Test Suite | Mar 15, 2019 | Benchmarking | CodeCode Available | 1 |
| Benchmarking Natural Language Understanding Services for building Conversational Agents | Mar 13, 2019 | BenchmarkingGeneral Classification | CodeCode Available | 1 |
| NAS-Bench-101: Towards Reproducible Neural Architecture Search | Feb 25, 2019 | BenchmarkingNeural Architecture Search | CodeCode Available | 1 |
| The StarCraft Multi-Agent Challenge | Feb 11, 2019 | BenchmarkingMuJoCo | CodeCode Available | 1 |
| The Liver Tumor Segmentation Benchmark (LiTS) | Jan 13, 2019 | BenchmarkingComputed Tomography (CT) | CodeCode Available | 1 |
| LEAF: A Benchmark for Federated Settings | Dec 3, 2018 | Autonomous VehiclesBenchmarking | CodeCode Available | 1 |
| GuacaMol: Benchmarking Models for De Novo Molecular Design | Nov 22, 2018 | BenchmarkingDrug Discovery | CodeCode Available | 1 |
| IOHprofiler: A Benchmarking and Profiling Tool for Iterative Optimization Heuristics | Oct 11, 2018 | Benchmarking | CodeCode Available | 1 |
| On Evaluation of Embodied Navigation Agents | Jul 18, 2018 | Benchmarking | CodeCode Available | 1 |
| Benchmarking Neural Network Robustness to Common Corruptions and Surface Variations | Jul 4, 2018 | Adversarial DefenseBenchmarking | CodeCode Available | 1 |
| Texygen: A Benchmarking Platform for Text Generation Models | Feb 6, 2018 | BenchmarkingDiversity | CodeCode Available | 1 |
| Benchmarking Relief-Based Feature Selection Methods for Bioinformatics Data Mining | Nov 22, 2017 | Benchmarkingfeature selection | CodeCode Available | 1 |
| Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms | Aug 25, 2017 | BenchmarkingBIG-bench Machine Learning | CodeCode Available | 1 |
| featsel: A framework for benchmarking of feature selection algorithms and cost functions | Jul 19, 2017 | BenchmarkingComputational Efficiency | CodeCode Available | 1 |
| Multitask learning and benchmarking with clinical time series data | Mar 22, 2017 | BenchmarkingBIG-bench Machine Learning | CodeCode Available | 1 |
| MS MARCO: A Human Generated MAchine Reading COmprehension Dataset | Nov 28, 2016 | BenchmarkingMachine Reading Comprehension | CodeCode Available | 1 |
| CIDEr: Consensus-based Image Description Evaluation | Nov 20, 2014 | Action RecognitionAttribute | CodeCode Available | 1 |
| Building a Scalable and Interpretable Bayesian Deep Learning Framework for Quality Control of Free Form Surfaces | Apr 7, 1994 | Active LearningBenchmarking | CodeCode Available | 1 |
| Visual Place Recognition for Large-Scale UAV Applications | Jul 20, 2025 | BenchmarkingDiversity | —Unverified | 0 |
| Training Transformers with Enforced Lipschitz Constants | Jul 17, 2025 | Benchmarking | —Unverified | 0 |
| MUPAX: Multidimensional Problem Agnostic eXplainable AI | Jul 17, 2025 | Anatomical Landmark DetectionAudio Classification | —Unverified | 0 |