| Beyond MD17: the reactive xxMD dataset | Aug 22, 2023 | BenchmarkingComputational chemistry | CodeCode Available | 0 |
| The biglasso Package: A Memory- and Computation-Efficient Solver for Lasso Model Fitting with Big Data in R | Jan 20, 2017 | Benchmarking | CodeCode Available | 0 |
| Learning to Transfer for Traffic Forecasting via Multi-task Learning | Nov 27, 2021 | BenchmarkingDomain Adaptation | CodeCode Available | 0 |
| IOLBENCH: Benchmarking LLMs on Linguistic Reasoning | Jan 8, 2025 | Benchmarking | CodeCode Available | 0 |
| InViG: Benchmarking Interactive Visual Grounding with 500K Human-Robot Interactions | Oct 18, 2023 | BenchmarkingVisual Grounding | CodeCode Available | 0 |
| Investigating the Impact of Hard Samples on Accuracy Reveals In-class Data Imbalance | Sep 22, 2024 | AutoMLBenchmarking | CodeCode Available | 0 |
| BEARD: Benchmarking the Adversarial Robustness for Dataset Distillation | Nov 14, 2024 | Adversarial AttackAdversarial Robustness | CodeCode Available | 0 |
| RerrFact: Reduced Evidence Retrieval Representations for Scientific Claim Verification | Feb 5, 2022 | BenchmarkingBinary Classification | CodeCode Available | 0 |
| Inverse Contextual Bandits: Learning How Behavior Evolves over Time | Jul 13, 2021 | BenchmarkingDecision Making | CodeCode Available | 0 |
| UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models | Oct 17, 2024 | Benchmarking | CodeCode Available | 0 |
| Introducing SLAMBench, a performance and accuracy benchmarking methodology for SLAM | Oct 8, 2014 | Benchmarking | CodeCode Available | 0 |
| INTERSPEECH 2009 Emotion Challenge Revisited: Benchmarking 15 Years of Progress in Speech Emotion Recognition | Jun 10, 2024 | BenchmarkingEmotion Recognition | CodeCode Available | 0 |
| Integration of nested cross-validation, automated hyperparameter optimization, high-performance computing to reduce and quantify the variance of test performance estimation of deep learning models | Mar 11, 2025 | BenchmarkingHyperparameter Optimization | CodeCode Available | 0 |
| BdSLW60: A Word-Level Bangla Sign Language Dataset | Feb 13, 2024 | BenchmarkingGesture Recognition | CodeCode Available | 0 |
| The Butterfly Effect of Model Editing: Few Edits Can Trigger Large Language Models Collapse | Feb 15, 2024 | BenchmarkingModel Editing | CodeCode Available | 0 |
| Integrating Expert Knowledge into Logical Programs via LLMs | Feb 17, 2025 | BenchmarkingLogical Reasoning | CodeCode Available | 0 |
| The CaLiGraph Ontology as a Challenge for OWL Reasoners | Oct 11, 2021 | BenchmarkingKnowledge Graphs | CodeCode Available | 0 |
| The Catechol Benchmark: Time-series Solvent Selection Data for Few-shot Machine Learning | Jun 9, 2025 | Active LearningBenchmarking | CodeCode Available | 0 |
| Strong and Simple Baselines for Multimodal Utterance Embeddings | May 14, 2019 | Benchmarking | CodeCode Available | 0 |
| InstaIndoor and Multi-modal Deep Learning for Indoor Scene Recognition | Dec 23, 2021 | BenchmarkingDeep Learning | CodeCode Available | 0 |
| The Collective Knowledge project: making ML models more portable and reproducible with open APIs, reusable best practices and MLOps | Jun 12, 2020 | Benchmarkingobject-detection | CodeCode Available | 0 |
| a-DCF: an architecture agnostic metric with application to spoofing-robust speaker verification | Mar 3, 2024 | BenchmarkingSpeaker Verification | CodeCode Available | 0 |
| Resource Interoperability for Sustainable Benchmarking: The Case of Events | May 1, 2018 | Benchmarking | CodeCode Available | 0 |
| Bayesian Neural Networks with Soft Evidence | Oct 19, 2020 | Benchmarking | CodeCode Available | 0 |
| BASED: Benchmarking, Analysis, and Structural Estimation of Deblurring | May 27, 2023 | BenchmarkingDeblurring | CodeCode Available | 0 |
| Bugs in the Data: How ImageNet Misrepresents Biodiversity | Aug 24, 2022 | BenchmarkingObject Detection | CodeCode Available | 0 |
| inMOTIFin: a lightweight end-to-end simulation software for regulatory sequences | Jun 25, 2025 | Benchmarking | CodeCode Available | 0 |
| LexSumm and LexT5: Benchmarking and Modeling Legal Summarization Tasks in English | Oct 12, 2024 | Benchmarking | CodeCode Available | 0 |
| InDL: A New Dataset and Benchmark for In-Diagram Logic Interpretation based on Visual Illusion | May 28, 2023 | BenchmarkingDecision Making | CodeCode Available | 0 |
| Individual Fairness Guarantees for Neural Networks | May 11, 2022 | BenchmarkingFairness | CodeCode Available | 0 |
| IndiBias: A Benchmark Dataset to Measure Social Biases in Language Models for Indian Context | Mar 29, 2024 | BenchmarkingSentence | CodeCode Available | 0 |
| LibOPT: An Open-Source Platform for Fast Prototyping Soft Optimization Techniques | Apr 18, 2017 | Benchmarking | CodeCode Available | 0 |
| BubGAN: Bubble Generative Adversarial Networks for Synthesizing Realistic Bubbly Flow Images | Sep 7, 2018 | Benchmarking | CodeCode Available | 0 |
| bsnsing: A decision tree induction method based on recursive optimal boolean rule composition | May 30, 2022 | Benchmarking | CodeCode Available | 0 |
| Rethinking Empirical Evaluation of Adversarial Robustness Using First-Order Attack Methods | Jun 1, 2020 | Adversarial RobustnessBenchmarking | CodeCode Available | 0 |
| Improving the Perturbation-Based Explanation of Deepfake Detectors Through the Use of Adversarially-Generated Samples | Feb 6, 2025 | BenchmarkingDeepFake Detection | CodeCode Available | 0 |
| BSBench: will your LLM find the largest prime number? | Jun 5, 2025 | Benchmarking | CodeCode Available | 0 |
| Light Field Saliency Detection with Deep Convolutional Networks | Jun 19, 2019 | BenchmarkingSaliency Detection | CodeCode Available | 0 |
| Improving Pretrained Models for Zero-shot Multi-label Text Classification through Reinforced Label Hierarchy Reasoning | Apr 4, 2021 | BenchmarkingMulti Label Text Classification | CodeCode Available | 0 |
| Bridging the Generalisation Gap: Synthetic Data Generation for Multi-Site Clinical Model Validation | Apr 29, 2025 | BenchmarkingFairness | CodeCode Available | 0 |
| An Analyst-Inspector Framework for Evaluating Reproducibility of LLMs in Data Science | Feb 23, 2025 | BenchmarkingCode Generation | CodeCode Available | 0 |
| Rethinking the Effectiveness of Graph Classification Datasets in Benchmarks for Assessing GNNs | Jul 6, 2024 | BenchmarkingDataset Generation | CodeCode Available | 0 |
| On-orbit model training for satellite imagery with label proportions | Jun 21, 2023 | BenchmarkingEarth Observation | CodeCode Available | 0 |
| LimeSoDa: A Dataset Collection for Benchmarking of Machine Learning Regressors in Digital Soil Mapping | Feb 27, 2025 | Benchmarking | CodeCode Available | 0 |
| Improving Generalization of Neural Vehicle Routing Problem Solvers Through the Lens of Model Architecture | Jun 10, 2024 | BenchmarkingDecoder | CodeCode Available | 0 |
| Rethinking the Reference-based Distinctive Image Captioning | Jul 22, 2022 | AttributeBenchmarking | CodeCode Available | 0 |
| Linear energy storage and flexibility model with ramp rate, ramping, deadline and capacity constraints | Sep 12, 2024 | Benchmarking | CodeCode Available | 0 |
| BRI3L: A Brightness Illusion Image Dataset for Identification and Localization of Regions of Illusory Perception | Feb 7, 2024 | Benchmarking | CodeCode Available | 0 |
| BoxingGym: Benchmarking Progress in Automated Experimental Design and Model Discovery | Jan 2, 2025 | BenchmarkingExperimental Design | CodeCode Available | 0 |
| BONES: a Benchmark fOr Neural Estimation of Shapley values | Jul 23, 2024 | Benchmarking | CodeCode Available | 0 |