| A Comprehensive Overview of Large Language Models | Jul 12, 2023 | Benchmarking | CodeCode Available | 1 |
| Examining the Effects of Degree Distribution and Homophily in Graph Learning Models | Jul 17, 2023 | BenchmarkingGraph Clustering | CodeCode Available | 1 |
| Leveraging Trust for Joint Multi-Objective and Multi-Fidelity Optimization | Dec 27, 2021 | Bayesian OptimizationBenchmarking | CodeCode Available | 1 |
| Analog or Digital In-memory Computing? Benchmarking through Quantitative Modeling | May 23, 2024 | Benchmarking | CodeCode Available | 1 |
| Benchmarking Deep Graph Generative Models for Optimizing New Drug Molecules for COVID-19 | Feb 9, 2021 | BenchmarkingQ-Learning | CodeCode Available | 1 |
| CHILI: Chemically-Informed Large-scale Inorganic Nanomaterials Dataset for Advancing Graph Machine Learning | Feb 20, 2024 | Atomic number classificationBenchmarking | CodeCode Available | 1 |
| Exploring Large Language Models for Classical Philology | May 23, 2023 | BenchmarkingDecoder | CodeCode Available | 1 |
| CIDEr: Consensus-based Image Description Evaluation | Nov 20, 2014 | Action RecognitionAttribute | CodeCode Available | 1 |
| AirSim Drone Racing Lab | Mar 12, 2020 | BenchmarkingOptical Flow Estimation | CodeCode Available | 1 |
| FaceBench: A Multi-View Multi-Level Facial Attribute VQA Dataset for Benchmarking Face Perception MLLMs | Mar 27, 2025 | AttributeBenchmarking | CodeCode Available | 1 |
| Benchmarking Actor-Critic Deep Reinforcement Learning Algorithms for Robotics Control with Action Constraints | Apr 18, 2023 | BenchmarkingDeep Reinforcement Learning | CodeCode Available | 1 |
| Fantastic Questions and Where to Find Them: FairytaleQA -- An Authentic Dataset for Narrative Comprehension | Mar 26, 2022 | BenchmarkingQuestion Answering | CodeCode Available | 1 |
| Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms | Aug 25, 2017 | BenchmarkingBIG-bench Machine Learning | CodeCode Available | 1 |
| A SWAT-based Reinforcement Learning Framework for Crop Management | Feb 10, 2023 | BenchmarkingDecision Making | CodeCode Available | 1 |
| featsel: A framework for benchmarking of feature selection algorithms and cost functions | Jul 19, 2017 | BenchmarkingComputational Efficiency | CodeCode Available | 1 |
| Benchmarking Chinese Commonsense Reasoning of LLMs: From Chinese-Specifics to Reasoning-Memorization Correlations | Mar 21, 2024 | BenchmarkingMemorization | CodeCode Available | 1 |
| Benchmarking Adversarial Patch Against Aerial Detection | Oct 30, 2022 | Benchmarking | CodeCode Available | 1 |
| Benchmarking Data Science Agents | Feb 27, 2024 | BenchmarkingCode Generation | CodeCode Available | 1 |
| FELM: Benchmarking Factuality Evaluation of Large Language Models | Oct 1, 2023 | BenchmarkingMath | CodeCode Available | 1 |
| CheX-GPT: Harnessing Large Language Models for Enhanced Chest X-ray Report Labeling | Jan 21, 2024 | Benchmarking | CodeCode Available | 1 |
| Benchmarking Adversarial Robustness on Image Classification | Jun 1, 2020 | Adversarial AttackAdversarial Robustness | CodeCode Available | 1 |
| CIPCaD-Bench: Continuous Industrial Process datasets for benchmarking Causal Discovery methods | Aug 2, 2022 | BenchmarkingCausal Discovery | CodeCode Available | 1 |
| FineSurE: Fine-grained Summarization Evaluation using LLMs | Jul 1, 2024 | BenchmarkingHallucination | CodeCode Available | 1 |
| CO-Bench: Benchmarking Language Model Agents in Algorithm Search for Combinatorial Optimization | Apr 6, 2025 | BenchmarkingCombinatorial Optimization | CodeCode Available | 1 |
| CommonPower: A Framework for Safe Data-Driven Smart Grid Control | Jun 5, 2024 | Benchmarkingenergy management | CodeCode Available | 1 |