| An Analyst-Inspector Framework for Evaluating Reproducibility of LLMs in Data Science | Feb 23, 2025 | BenchmarkingCode Generation | CodeCode Available | 0 |
| Rethinking the Effectiveness of Graph Classification Datasets in Benchmarks for Assessing GNNs | Jul 6, 2024 | BenchmarkingDataset Generation | CodeCode Available | 0 |
| On-orbit model training for satellite imagery with label proportions | Jun 21, 2023 | BenchmarkingEarth Observation | CodeCode Available | 0 |
| LimeSoDa: A Dataset Collection for Benchmarking of Machine Learning Regressors in Digital Soil Mapping | Feb 27, 2025 | Benchmarking | CodeCode Available | 0 |
| Improving Generalization of Neural Vehicle Routing Problem Solvers Through the Lens of Model Architecture | Jun 10, 2024 | BenchmarkingDecoder | CodeCode Available | 0 |
| Rethinking the Reference-based Distinctive Image Captioning | Jul 22, 2022 | AttributeBenchmarking | CodeCode Available | 0 |
| Linear energy storage and flexibility model with ramp rate, ramping, deadline and capacity constraints | Sep 12, 2024 | Benchmarking | CodeCode Available | 0 |
| BRI3L: A Brightness Illusion Image Dataset for Identification and Localization of Regions of Illusory Perception | Feb 7, 2024 | Benchmarking | CodeCode Available | 0 |
| BoxingGym: Benchmarking Progress in Automated Experimental Design and Model Discovery | Jan 2, 2025 | BenchmarkingExperimental Design | CodeCode Available | 0 |
| BONES: a Benchmark fOr Neural Estimation of Shapley values | Jul 23, 2024 | Benchmarking | CodeCode Available | 0 |