| Contemporary Symbolic Regression Methods and their Relative Performance | Jul 29, 2021 | Benchmarkingparameter estimation | CodeCode Available | 1 |
| CounselBench: A Large-Scale Expert Evaluation and Adversarial Benchmark of Large Language Models in Mental Health Counseling | Jun 10, 2025 | Benchmarking | CodeCode Available | 1 |
| Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative Tasks | Jun 14, 2020 | BenchmarkingDeep Reinforcement Learning | CodeCode Available | 1 |
| CommonPower: A Framework for Safe Data-Driven Smart Grid Control | Jun 5, 2024 | Benchmarkingenergy management | CodeCode Available | 1 |
| CompanyKG: A Large-Scale Heterogeneous Graph for Company Similarity Quantification | Jun 18, 2023 | BenchmarkingRetrieval | CodeCode Available | 1 |
| ComplexBench-Edit: Benchmarking Complex Instruction-Driven Image Editing via Compositional Dependencies | Jun 15, 2025 | Benchmarking | CodeCode Available | 1 |
| CombiBench: Benchmarking LLM Capability for Combinatorial Mathematics | May 6, 2025 | Benchmarking | CodeCode Available | 1 |
| Active-Passive SimStereo -- Benchmarking the Cross-Generalization Capabilities of Deep Learning-based Stereo Methods | Sep 17, 2022 | BenchmarkingStereo Matching | CodeCode Available | 1 |
| Combinatorial Optimization with Policy Adaptation using Latent Space Search | Nov 13, 2023 | BenchmarkingCombinatorial Optimization | CodeCode Available | 1 |
| MatTools: Benchmarking Large Language Models for Materials Science Tools | May 16, 2025 | BenchmarkingQuestion Answering | CodeCode Available | 1 |
| Comics Datasets Framework: Mix of Comics datasets for detection benchmarking | Jul 3, 2024 | BenchmarkingObject | CodeCode Available | 1 |
| Comprehensive benchmarking of large language models for RNA secondary structure prediction | Oct 21, 2024 | Benchmarking | CodeCode Available | 1 |
| Coursera Corpus Mining and Multistage Fine-Tuning for Improving Lectures Translation | Dec 26, 2019 | BenchmarkingDomain Adaptation | CodeCode Available | 1 |
| CoDEx: A Comprehensive Knowledge Graph Completion Benchmark | Sep 16, 2020 | BenchmarkingKnowledge Graph Completion | CodeCode Available | 1 |
| Amharic LLaMA and LLaVA: Multimodal LLMs for Low Resource Languages | Mar 11, 2024 | BenchmarkingData Augmentation | CodeCode Available | 1 |
| CodeS: Natural Language to Code Repository via Multi-Layer Sketch | Mar 25, 2024 | Benchmarking | CodeCode Available | 1 |
| CodeReef: an open platform for portable MLOps, reusable automation actions and reproducible benchmarking | Jan 22, 2020 | Benchmarkingobject-detection | CodeCode Available | 1 |
| CodeUpdateArena: Benchmarking Knowledge Editing on API Updates | Jul 8, 2024 | Benchmarkingknowledge editing | CodeCode Available | 1 |
| Collab-Overcooked: Benchmarking and Evaluating Large Language Models as Collaborative Agents | Feb 27, 2025 | Benchmarking | CodeCode Available | 1 |
| ALTO: A Large-Scale Dataset for UAV Visual Place Recognition and Localization | Jul 19, 2022 | BenchmarkingImage Registration | CodeCode Available | 1 |
| CodeIF: Benchmarking the Instruction-Following Capabilities of Large Language Models for Code Generation | Feb 26, 2025 | BenchmarkingCode Generation | CodeCode Available | 1 |
| Bag of Tricks for Adversarial Training | Oct 1, 2020 | Adversarial RobustnessBenchmarking | CodeCode Available | 1 |
| CODEMENV: Benchmarking Large Language Models on Code Migration | Jun 1, 2025 | Benchmarking | CodeCode Available | 1 |
| New Protocols and Negative Results for Textual Entailment Data Collection | Apr 24, 2020 | BenchmarkingDiversity | CodeCode Available | 1 |
| A Critical Assessment of State-of-the-Art in Entity Alignment | Oct 30, 2020 | BenchmarkingEntity Alignment | CodeCode Available | 1 |
| COCO: The Large Scale Black-Box Optimization Benchmarking (bbob-largescale) Test Suite | Mar 15, 2019 | Benchmarking | CodeCode Available | 1 |
| Coarse-to-Fine Q-attention with Learned Path Ranking | Apr 4, 2022 | Benchmarking | CodeCode Available | 1 |
| CO-Bench: Benchmarking Language Model Agents in Algorithm Search for Combinatorial Optimization | Apr 6, 2025 | BenchmarkingCombinatorial Optimization | CodeCode Available | 1 |
| Codabench: Flexible, Easy-to-Use and Reproducible Benchmarking Platform | Oct 12, 2021 | Benchmarking | CodeCode Available | 1 |
| Clinical Prompt Learning with Frozen Language Models | May 11, 2022 | BenchmarkingGPU | CodeCode Available | 1 |
| CLoG: Benchmarking Continual Learning of Image Generation Models | Jun 7, 2024 | BenchmarkingContinual Learning | CodeCode Available | 1 |
| ClearPose: Large-scale Transparent Object Dataset and Benchmark | Mar 8, 2022 | BenchmarkingDepth Completion | CodeCode Available | 1 |
| ClimART: A Benchmark Dataset for Emulating Atmospheric Radiative Transfer in Weather and Climate Models | Nov 29, 2021 | BenchmarkingPhysical Simulations | CodeCode Available | 1 |
| CloudEval-YAML: A Practical Benchmark for Cloud Configuration Generation | Nov 10, 2023 | BenchmarkingCloud Computing | CodeCode Available | 1 |
| CODEBench: A Neural Architecture and Hardware Accelerator Co-Design Framework | Dec 7, 2022 | Benchmarking | CodeCode Available | 1 |
| Collective Knowledge: organizing research projects as a database of reusable components and portable workflows with common APIs | Nov 2, 2020 | Benchmarking | CodeCode Available | 1 |
| CovDocker: Benchmarking Covalent Drug Design with Tasks, Datasets, and Solutions | Jun 26, 2025 | BenchmarkingDrug Design | CodeCode Available | 1 |
| AllClear: A Comprehensive Dataset and Benchmark for Cloud Removal in Satellite Imagery | Oct 31, 2024 | BenchmarkingCloud Removal | CodeCode Available | 1 |
| CIBench: Evaluating Your LLMs with a Code Interpreter Plugin | Jul 15, 2024 | Benchmarking | CodeCode Available | 1 |
| CIDEr: Consensus-based Image Description Evaluation | Nov 20, 2014 | Action RecognitionAttribute | CodeCode Available | 1 |
| CheXphoto: 10,000+ Photos and Transformations of Chest X-rays for Benchmarking Deep Learning Robustness | Jul 13, 2020 | Benchmarking | CodeCode Available | 1 |
| CHILI: Chemically-Informed Large-scale Inorganic Nanomaterials Dataset for Advancing Graph Machine Learning | Feb 20, 2024 | Atomic number classificationBenchmarking | CodeCode Available | 1 |
| CIPCaD-Bench: Continuous Industrial Process datasets for benchmarking Causal Discovery methods | Aug 2, 2022 | BenchmarkingCausal Discovery | CodeCode Available | 1 |
| Align and Distill: Unifying and Improving Domain Adaptive Object Detection | Mar 18, 2024 | Benchmarkingobject-detection | CodeCode Available | 1 |
| On the Detectability of ChatGPT Content: Benchmarking, Methodology, and Evaluation through the Lens of Academic Writing | Jun 7, 2023 | BenchmarkingPrompt Engineering | CodeCode Available | 1 |
| BabySLM: language-acquisition-friendly benchmark of self-supervised spoken language models | Jun 2, 2023 | BenchmarkingLanguage Acquisition | CodeCode Available | 1 |
| CheX-GPT: Harnessing Large Language Models for Enhanced Chest X-ray Report Labeling | Jan 21, 2024 | Benchmarking | CodeCode Available | 1 |
| Circumventing shortcuts in audio-visual deepfake detection datasets with unsupervised learning | Nov 29, 2024 | BenchmarkingDeepFake Detection | CodeCode Available | 1 |
| CCTV-Gun: Benchmarking Handgun Detection in CCTV Images | Mar 19, 2023 | Benchmarkingobject-detection | CodeCode Available | 1 |
| Towards Motion Forecasting with Real-World Perception Inputs: Are End-to-End Approaches Competitive? | Jun 15, 2023 | Autonomous DrivingAutonomous Vehicles | CodeCode Available | 1 |