| A Survey on Graph Counterfactual Explanations: Definitions, Methods, Evaluation, and Research Challenges | Oct 21, 2022 | BenchmarkingCommunity Detection | CodeCode Available | 1 | 5 |
| Replication in Visual Diffusion Models: A Survey and Outlook | Jul 7, 2024 | BenchmarkingSurvey | CodeCode Available | 1 | 5 |
| AIPerf: Automated machine learning as an AI-HPC benchmark | Aug 17, 2020 | AutoMLBenchmarking | CodeCode Available | 1 | 5 |
| CASTLE: Benchmarking Dataset for Static Code Analyzers and LLMs towards CWE Detection | Mar 12, 2025 | BenchmarkingCode Classification | CodeCode Available | 1 | 5 |
| Benchmarking LLMs' Swarm intelligence | May 7, 2025 | Benchmarking | CodeCode Available | 1 | 5 |
| IMGTB: A Framework for Machine-Generated Text Detection Benchmarking | Nov 21, 2023 | BenchmarkingText Detection | CodeCode Available | 1 | 5 |
| 4DBInfer: A 4D Benchmarking Toolbox for Graph-Centric Predictive Modeling on Relational DBs | Apr 28, 2024 | Benchmarking | CodeCode Available | 1 | 5 |
| Can 3D Vision-Language Models Truly Understand Natural Language? | Mar 21, 2024 | BenchmarkingDiversity | CodeCode Available | 1 | 5 |
| Benchmarking Local Robustness of High-Accuracy Binary Neural Networks for Enhanced Traffic Sign Recognition | Sep 25, 2023 | Autonomous DrivingBenchmarking | CodeCode Available | 1 | 5 |
| EduBench: A Comprehensive Benchmarking Dataset for Evaluating Large Language Models in Diverse Educational Scenarios | May 22, 2025 | Benchmarking | CodeCode Available | 1 | 5 |