| DACBench: A Benchmark Library for Dynamic Algorithm Configuration | May 18, 2021 | Benchmarking | CodeCode Available | 1 | 5 |
| Benchmarking Multimodal Variational Autoencoders: CdSprites+ Dataset and Toolkit | Sep 7, 2022 | Benchmarking | CodeCode Available | 1 | 5 |
| Data Generating Process to Evaluate Causal Discovery Techniques for Time Series Data | Apr 16, 2021 | BenchmarkingCausal Discovery | CodeCode Available | 1 | 5 |
| DCL-Net: Deep Correspondence Learning Network for 6D Pose Estimation | Oct 11, 2022 | 6D Pose Estimation6D Pose Estimation using RGB | CodeCode Available | 1 | 5 |
| Curious Hierarchical Actor-Critic Reinforcement Learning | May 7, 2020 | BenchmarkingHierarchical Reinforcement Learning | CodeCode Available | 1 | 5 |
| Application-Oriented Benchmarking of Quantum Generative Learning Using QUARK | Aug 8, 2023 | BenchmarkingGPU | CodeCode Available | 1 | 5 |
| Towards Reliable Detection of LLM-Generated Texts: A Comprehensive Evaluation Framework with CUDRT | Jun 13, 2024 | BenchmarkingLLM-generated Text Detection | CodeCode Available | 1 | 5 |
| CryptOpt: Verified Compilation with Randomized Program Search for Cryptographic Primitives (full version) | Nov 19, 2022 | BenchmarkingC++ code | CodeCode Available | 1 | 5 |
| CRoW: Benchmarking Commonsense Reasoning in Real-World Tasks | Oct 23, 2023 | Benchmarking | CodeCode Available | 1 | 5 |
| Benchmarking Object Detectors under Real-World Distribution Shifts in Satellite Imagery | Mar 24, 2025 | BenchmarkingHumanitarian | CodeCode Available | 1 | 5 |
| CSAW-M: An Ordinal Classification Dataset for Benchmarking Mammographic Masking of Cancer | Dec 2, 2021 | BenchmarkingOrdinal Classification | CodeCode Available | 1 | 5 |
| CySecBench: Generative AI-based CyberSecurity-focused Prompt Dataset for Benchmarking Large Language Models | Jan 2, 2025 | BenchmarkingComputer Security | CodeCode Available | 1 | 5 |
| An Evaluation Dataset for Intent Classification and Out-of-Scope Prediction | Sep 4, 2019 | BenchmarkingGeneral Classification | CodeCode Available | 1 | 5 |
| Benchmarking human visual search computational models in natural scenes: models comparison and reference datasets | Dec 10, 2021 | Benchmarking | CodeCode Available | 1 | 5 |
| Cross-Modal Bidirectional Interaction Model for Referring Remote Sensing Image Segmentation | Oct 11, 2024 | BenchmarkingImage Segmentation | CodeCode Available | 1 | 5 |
| D2S: Document-to-Slide Generation Via Query-Based Text Summarization | May 8, 2021 | BenchmarkingLong Form Question Answering | CodeCode Available | 1 | 5 |
| Decentralized Arena: Towards Democratic and Scalable Automatic Evaluation of Language Models | May 19, 2025 | BenchmarkingChatbot | CodeCode Available | 1 | 5 |
| COVID-19 event extraction from Twitter via extractive question answering with continuous prompts | Mar 19, 2023 | BenchmarkingEvent Extraction | CodeCode Available | 1 | 5 |
| CovDocker: Benchmarking Covalent Drug Design with Tasks, Datasets, and Solutions | Jun 26, 2025 | BenchmarkingDrug Design | CodeCode Available | 1 | 5 |
| An Empirical Study on Google Research Football Multi-agent Scenarios | May 16, 2023 | BenchmarkingMulti-agent Reinforcement Learning | CodeCode Available | 1 | 5 |
| Addressing the generalization of 3D registration methods with a featureless baseline and an unbiased benchmark | Mar 23, 2024 | BenchmarkingImage to Point Cloud Registration | CodeCode Available | 1 | 5 |
| Addressing Shortcomings in Fair Graph Learning Datasets: Towards a New Benchmark | Mar 9, 2024 | BenchmarkingFairness | CodeCode Available | 1 | 5 |
| An Empirical Study of GPT-4o Image Generation Capabilities | Apr 8, 2025 | BenchmarkingImage Generation | CodeCode Available | 1 | 5 |
| Benchmarking Geospatial Question Answering Engines using the Dataset GeoQuestions1089 | Nov 6, 2023 | BenchmarkingKnowledge Base Question Answering | CodeCode Available | 1 | 5 |
| Benchmarking Graph Neural Networks for FMRI analysis | Nov 16, 2022 | Benchmarking | CodeCode Available | 1 | 5 |
| Massive-STEPS: Massive Semantic Trajectories for Understanding POI Check-ins -- Dataset and Benchmarks | May 16, 2025 | Benchmarking | CodeCode Available | 1 | 5 |
| CounselBench: A Large-Scale Expert Evaluation and Adversarial Benchmark of Large Language Models in Mental Health Counseling | Jun 10, 2025 | Benchmarking | CodeCode Available | 1 | 5 |
| Coursera Corpus Mining and Multistage Fine-Tuning for Improving Lectures Translation | Dec 26, 2019 | BenchmarkingDomain Adaptation | CodeCode Available | 1 | 5 |
| CriticBench: Benchmarking LLMs for Critique-Correct Reasoning | Feb 22, 2024 | Benchmarking | CodeCode Available | 1 | 5 |
| An Empirical Study into Clustering of Unseen Datasets with Self-Supervised Encoders | Jun 4, 2024 | BenchmarkingClustering | CodeCode Available | 1 | 5 |
| A BFS-Tree of Ranking References for Unsupervised Manifold Learning | Sep 24, 2020 | BenchmarkingImage Retrieval | CodeCode Available | 1 | 5 |
| Benchmarking Generated Poses: How Rational is Structure-based Drug Design with Generative Models? | Aug 14, 2023 | BenchmarkingDrug Design | CodeCode Available | 1 | 5 |
| Benchmarking Generation and Evaluation Capabilities of Large Language Models for Instruction Controllable Summarization | Nov 15, 2023 | BenchmarkingInstruction Following | CodeCode Available | 1 | 5 |
| CHOICE: Benchmarking the Remote Sensing Capabilities of Large Vision-Language Models | Nov 27, 2024 | BenchmarkingEarth Observation | CodeCode Available | 1 | 5 |
| Controlgym: Large-Scale Control Environments for Benchmarking Reinforcement Learning Algorithms | Nov 30, 2023 | BenchmarkingOpenAI Gym | CodeCode Available | 1 | 5 |
| Replication in Visual Diffusion Models: A Survey and Outlook | Jul 7, 2024 | BenchmarkingSurvey | CodeCode Available | 1 | 5 |
| Benchmarking Graph Neural Networks on Dynamic Link Prediction | Sep 29, 2021 | BenchmarkingDynamic Link Prediction | CodeCode Available | 1 | 5 |
| CosPGD: an efficient white-box adversarial attack for pixel-wise prediction tasks | Feb 4, 2023 | Adversarial AttackAdversarial Robustness | CodeCode Available | 1 | 5 |
| dEchorate: a Calibrated Room Impulse Response Database for Echo-aware Signal Processing | Apr 27, 2021 | BenchmarkingRetrieval | CodeCode Available | 1 | 5 |
| Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative Tasks | Jun 14, 2020 | BenchmarkingDeep Reinforcement Learning | CodeCode Available | 1 | 5 |
| ComplexBench-Edit: Benchmarking Complex Instruction-Driven Image Editing via Compositional Dependencies | Jun 15, 2025 | Benchmarking | CodeCode Available | 1 | 5 |
| Benchmarking Fish Dataset and Evaluation Metric in Keypoint Detection -- Towards Precise Fish Morphological Assessment in Aquaculture Breeding | May 21, 2024 | BenchmarkingKeypoint Detection | CodeCode Available | 1 | 5 |
| Comprehensive benchmarking of large language models for RNA secondary structure prediction | Oct 21, 2024 | Benchmarking | CodeCode Available | 1 | 5 |
| Comics Datasets Framework: Mix of Comics datasets for detection benchmarking | Jul 3, 2024 | BenchmarkingObject | CodeCode Available | 1 | 5 |
| CommonPower: A Framework for Safe Data-Driven Smart Grid Control | Jun 5, 2024 | Benchmarkingenergy management | CodeCode Available | 1 | 5 |
| CombiBench: Benchmarking LLM Capability for Combinatorial Mathematics | May 6, 2025 | Benchmarking | CodeCode Available | 1 | 5 |
| A Benchmarking Study of Kolmogorov-Arnold Networks on Tabular Data | Jun 20, 2024 | BenchmarkingKolmogorov-Arnold Networks | CodeCode Available | 1 | 5 |
| Combinatorial Optimization with Policy Adaptation using Latent Space Search | Nov 13, 2023 | BenchmarkingCombinatorial Optimization | CodeCode Available | 1 | 5 |
| CompanyKG: A Large-Scale Heterogeneous Graph for Company Similarity Quantification | Jun 18, 2023 | BenchmarkingRetrieval | CodeCode Available | 1 | 5 |
| Constellation Dataset: Benchmarking High-Altitude Object Detection for an Urban Intersection | Apr 25, 2024 | Benchmarkingobject-detection | CodeCode Available | 1 | 5 |