| Beyond MD17: the reactive xxMD dataset | Aug 22, 2023 | BenchmarkingComputational chemistry | CodeCode Available | 0 |
| The biglasso Package: A Memory- and Computation-Efficient Solver for Lasso Model Fitting with Big Data in R | Jan 20, 2017 | Benchmarking | CodeCode Available | 0 |
| Learning to Transfer for Traffic Forecasting via Multi-task Learning | Nov 27, 2021 | BenchmarkingDomain Adaptation | CodeCode Available | 0 |
| IOLBENCH: Benchmarking LLMs on Linguistic Reasoning | Jan 8, 2025 | Benchmarking | CodeCode Available | 0 |
| InViG: Benchmarking Interactive Visual Grounding with 500K Human-Robot Interactions | Oct 18, 2023 | BenchmarkingVisual Grounding | CodeCode Available | 0 |
| Investigating the Impact of Hard Samples on Accuracy Reveals In-class Data Imbalance | Sep 22, 2024 | AutoMLBenchmarking | CodeCode Available | 0 |
| BEARD: Benchmarking the Adversarial Robustness for Dataset Distillation | Nov 14, 2024 | Adversarial AttackAdversarial Robustness | CodeCode Available | 0 |
| RerrFact: Reduced Evidence Retrieval Representations for Scientific Claim Verification | Feb 5, 2022 | BenchmarkingBinary Classification | CodeCode Available | 0 |
| Inverse Contextual Bandits: Learning How Behavior Evolves over Time | Jul 13, 2021 | BenchmarkingDecision Making | CodeCode Available | 0 |
| UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models | Oct 17, 2024 | Benchmarking | CodeCode Available | 0 |
| Introducing SLAMBench, a performance and accuracy benchmarking methodology for SLAM | Oct 8, 2014 | Benchmarking | CodeCode Available | 0 |
| INTERSPEECH 2009 Emotion Challenge Revisited: Benchmarking 15 Years of Progress in Speech Emotion Recognition | Jun 10, 2024 | BenchmarkingEmotion Recognition | CodeCode Available | 0 |
| Integration of nested cross-validation, automated hyperparameter optimization, high-performance computing to reduce and quantify the variance of test performance estimation of deep learning models | Mar 11, 2025 | BenchmarkingHyperparameter Optimization | CodeCode Available | 0 |
| BdSLW60: A Word-Level Bangla Sign Language Dataset | Feb 13, 2024 | BenchmarkingGesture Recognition | CodeCode Available | 0 |
| The Butterfly Effect of Model Editing: Few Edits Can Trigger Large Language Models Collapse | Feb 15, 2024 | BenchmarkingModel Editing | CodeCode Available | 0 |
| Integrating Expert Knowledge into Logical Programs via LLMs | Feb 17, 2025 | BenchmarkingLogical Reasoning | CodeCode Available | 0 |
| The CaLiGraph Ontology as a Challenge for OWL Reasoners | Oct 11, 2021 | BenchmarkingKnowledge Graphs | CodeCode Available | 0 |
| The Catechol Benchmark: Time-series Solvent Selection Data for Few-shot Machine Learning | Jun 9, 2025 | Active LearningBenchmarking | CodeCode Available | 0 |
| Strong and Simple Baselines for Multimodal Utterance Embeddings | May 14, 2019 | Benchmarking | CodeCode Available | 0 |
| InstaIndoor and Multi-modal Deep Learning for Indoor Scene Recognition | Dec 23, 2021 | BenchmarkingDeep Learning | CodeCode Available | 0 |
| The Collective Knowledge project: making ML models more portable and reproducible with open APIs, reusable best practices and MLOps | Jun 12, 2020 | Benchmarkingobject-detection | CodeCode Available | 0 |
| a-DCF: an architecture agnostic metric with application to spoofing-robust speaker verification | Mar 3, 2024 | BenchmarkingSpeaker Verification | CodeCode Available | 0 |
| Resource Interoperability for Sustainable Benchmarking: The Case of Events | May 1, 2018 | Benchmarking | CodeCode Available | 0 |
| Bayesian Neural Networks with Soft Evidence | Oct 19, 2020 | Benchmarking | CodeCode Available | 0 |
| BASED: Benchmarking, Analysis, and Structural Estimation of Deblurring | May 27, 2023 | BenchmarkingDeblurring | CodeCode Available | 0 |