| Comics Datasets Framework: Mix of Comics datasets for detection benchmarking | Jul 3, 2024 | BenchmarkingObject | CodeCode Available | 1 |
| CODEBench: A Neural Architecture and Hardware Accelerator Co-Design Framework | Dec 7, 2022 | Benchmarking | CodeCode Available | 1 |
| Codabench: Flexible, Easy-to-Use and Reproducible Benchmarking Platform | Oct 12, 2021 | Benchmarking | CodeCode Available | 1 |
| Benchmarking Distribution Shift in Tabular Data with TableShift | Dec 10, 2023 | BenchmarkingBinary Classification | CodeCode Available | 1 |
| Benchmarking Object Detectors with COCO: A New Path Forward | Mar 27, 2024 | BenchmarkingObject | CodeCode Available | 1 |
| CO-Bench: Benchmarking Language Model Agents in Algorithm Search for Combinatorial Optimization | Apr 6, 2025 | BenchmarkingCombinatorial Optimization | CodeCode Available | 1 |
| Guardians of Image Quality: Benchmarking Defenses Against Adversarial Attacks on Image Quality Metrics | Aug 2, 2024 | Adversarial AttackAdversarial Purification | CodeCode Available | 1 |
| The Effect of Domain and Diacritics in Yorùbá-English Neural Machine Translation | Mar 15, 2021 | BenchmarkingDomain Adaptation | CodeCode Available | 1 |
| COCO: The Large Scale Black-Box Optimization Benchmarking (bbob-largescale) Test Suite | Mar 15, 2019 | Benchmarking | CodeCode Available | 1 |
| CodeIF: Benchmarking the Instruction-Following Capabilities of Large Language Models for Code Generation | Feb 26, 2025 | BenchmarkingCode Generation | CodeCode Available | 1 |
| CloudEval-YAML: A Practical Benchmark for Cloud Configuration Generation | Nov 10, 2023 | BenchmarkingCloud Computing | CodeCode Available | 1 |
| Benchmarking Econometric and Machine Learning Methodologies in Nowcasting | May 6, 2022 | BenchmarkingBIG-bench Machine Learning | CodeCode Available | 1 |
| MIMII DG: Sound Dataset for Malfunctioning Industrial Machine Investigation and Inspection for Domain Generalization Task | May 27, 2022 | BenchmarkingDomain Generalization | CodeCode Available | 1 |
| Mind the Gap: Benchmarking Spatial Reasoning in Vision-Language Models | Mar 25, 2025 | BenchmarkingImage Captioning | CodeCode Available | 1 |
| CLoG: Benchmarking Continual Learning of Image Generation Models | Jun 7, 2024 | BenchmarkingContinual Learning | CodeCode Available | 1 |
| Benchmarking Embedding Aggregation Methods in Computational Pathology: A Clinical Data Perspective | Jul 10, 2024 | BenchmarkingDiagnostic | CodeCode Available | 1 |
| Benchmarking Graph Neural Networks for FMRI analysis | Nov 16, 2022 | Benchmarking | CodeCode Available | 1 |
| Benchmarking End-to-End Behavioural Cloning on Video Games | Apr 2, 2020 | Behavioural cloningBenchmarking | CodeCode Available | 1 |
| Benchmarking Offline Reinforcement Learning on Real-Robot Hardware | Jul 28, 2023 | Benchmarkingreinforcement-learning | CodeCode Available | 1 |
| Mitigating Gender Bias in Captioning Systems | Jun 15, 2020 | BenchmarkingGender Prediction | CodeCode Available | 1 |
| AnuraSet: A dataset for benchmarking Neotropical anuran calls identification in passive acoustic monitoring | Jul 11, 2023 | Benchmarking | CodeCode Available | 1 |
| MLLM-DataEngine: An Iterative Refinement Approach for MLLM | Aug 25, 2023 | Benchmarking | CodeCode Available | 1 |
| 3DYoga90: A Hierarchical Video Dataset for Yoga Pose Understanding | Oct 16, 2023 | Action RecognitionBenchmarking | CodeCode Available | 1 |
| Clinical Prompt Learning with Frozen Language Models | May 11, 2022 | BenchmarkingGPU | CodeCode Available | 1 |
| Coarse-to-Fine Q-attention with Learned Path Ranking | Apr 4, 2022 | Benchmarking | CodeCode Available | 1 |