| Benchmarking Differential Privacy and Federated Learning for BERT Models | Jun 26, 2021 | BenchmarkingFederated Learning | CodeCode Available | 1 |
| Accelerated and interpretable oblique random survival forests | Aug 1, 2022 | BenchmarkingComputational Efficiency | CodeCode Available | 1 |
| A Comprehensive Benchmark for COVID-19 Predictive Modeling Using Electronic Health Records in Intensive Care | Sep 16, 2022 | BenchmarkingDeep Learning | CodeCode Available | 1 |
| Benchmarking Distribution Shift in Tabular Data with TableShift | Dec 10, 2023 | BenchmarkingBinary Classification | CodeCode Available | 1 |
| The Effect of Domain and Diacritics in Yorùbá-English Neural Machine Translation | Mar 15, 2021 | BenchmarkingDomain Adaptation | CodeCode Available | 1 |
| AIGV-Assessor: Benchmarking and Evaluating the Perceptual Quality of Text-to-Video Generation with LMM | Nov 26, 2024 | BenchmarkingText-to-Video Generation | CodeCode Available | 1 |
| Collective Knowledge: organizing research projects as a database of reusable components and portable workflows with common APIs | Nov 2, 2020 | Benchmarking | CodeCode Available | 1 |
| MetaFormer and CNN Hybrid Model for Polyp Image Segmentation | Sep 16, 2024 | BenchmarkingImage Segmentation | CodeCode Available | 1 |
| Benchmarking: Past, Present and Future | Aug 1, 2021 | BenchmarkingReading Comprehension | CodeCode Available | 1 |
| MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research | Mar 17, 2025 | ArticlesBenchmarking | CodeCode Available | 1 |
| MIMII DG: Sound Dataset for Malfunctioning Industrial Machine Investigation and Inspection for Domain Generalization Task | May 27, 2022 | BenchmarkingDomain Generalization | CodeCode Available | 1 |
| Benchmarking Econometric and Machine Learning Methodologies in Nowcasting | May 6, 2022 | BenchmarkingBIG-bench Machine Learning | CodeCode Available | 1 |
| Comics Datasets Framework: Mix of Comics datasets for detection benchmarking | Jul 3, 2024 | BenchmarkingObject | CodeCode Available | 1 |
| minicons: Enabling Flexible Behavioral and Representational Analyses of Transformer Language Models | Mar 24, 2022 | BenchmarkingSentence | CodeCode Available | 1 |
| MIRFLEX: Music Information Retrieval Feature Library for Extraction | Nov 1, 2024 | BenchmarkingInformation Retrieval | CodeCode Available | 1 |
| Benchmarking Embedding Aggregation Methods in Computational Pathology: A Clinical Data Perspective | Jul 10, 2024 | BenchmarkingDiagnostic | CodeCode Available | 1 |
| Benchmarking Image Retrieval for Visual Localization | Nov 24, 2020 | Autonomous DrivingBenchmarking | CodeCode Available | 1 |
| Collab-Overcooked: Benchmarking and Evaluating Large Language Models as Collaborative Agents | Feb 27, 2025 | Benchmarking | CodeCode Available | 1 |
| CodeUpdateArena: Benchmarking Knowledge Editing on API Updates | Jul 8, 2024 | Benchmarkingknowledge editing | CodeCode Available | 1 |
| MLonMCU: TinyML Benchmarking with Fast Retargeting | Jun 15, 2023 | Benchmarking | CodeCode Available | 1 |
| Benchmarking Multimodal Mathematical Reasoning with Explicit Visual Dependency | Apr 24, 2025 | BenchmarkingMath | CodeCode Available | 1 |
| CoDEx: A Comprehensive Knowledge Graph Completion Benchmark | Sep 16, 2020 | BenchmarkingKnowledge Graph Completion | CodeCode Available | 1 |
| MMDetection: Open MMLab Detection Toolbox and Benchmark | Jun 17, 2019 | BenchmarkingInstance Segmentation | CodeCode Available | 1 |
| New Protocols and Negative Results for Textual Entailment Data Collection | Apr 24, 2020 | BenchmarkingDiversity | CodeCode Available | 1 |
| CodeIF: Benchmarking the Instruction-Following Capabilities of Large Language Models for Code Generation | Feb 26, 2025 | BenchmarkingCode Generation | CodeCode Available | 1 |
| MNIST-C: A Robustness Benchmark for Computer Vision | Jun 5, 2019 | Adversarial RobustnessBenchmarking | CodeCode Available | 1 |
| CODEMENV: Benchmarking Large Language Models on Code Migration | Jun 1, 2025 | Benchmarking | CodeCode Available | 1 |
| Benchmarking human visual search computational models in natural scenes: models comparison and reference datasets | Dec 10, 2021 | Benchmarking | CodeCode Available | 1 |
| Guardians of Image Quality: Benchmarking Defenses Against Adversarial Attacks on Image Quality Metrics | Aug 2, 2024 | Adversarial AttackAdversarial Purification | CodeCode Available | 1 |
| MONICA: Benchmarking on Long-tailed Medical Image Classification | Oct 2, 2024 | BenchmarkingClassification | CodeCode Available | 1 |
| CODEBench: A Neural Architecture and Hardware Accelerator Co-Design Framework | Dec 7, 2022 | Benchmarking | CodeCode Available | 1 |
| CodeReef: an open platform for portable MLOps, reusable automation actions and reproducible benchmarking | Jan 22, 2020 | Benchmarkingobject-detection | CodeCode Available | 1 |
| CO-Bench: Benchmarking Language Model Agents in Algorithm Search for Combinatorial Optimization | Apr 6, 2025 | BenchmarkingCombinatorial Optimization | CodeCode Available | 1 |
| Coarse-to-Fine Q-attention with Learned Path Ranking | Apr 4, 2022 | Benchmarking | CodeCode Available | 1 |
| COCO: The Large Scale Black-Box Optimization Benchmarking (bbob-largescale) Test Suite | Mar 15, 2019 | Benchmarking | CodeCode Available | 1 |
| Benchmarking Implicit Neural Representation and Geometric Rendering in Real-Time RGB-D SLAM | Mar 28, 2024 | Benchmarking | CodeCode Available | 1 |
| API-BLEND: A Comprehensive Corpora for Training and Benchmarking API LLMs | Feb 23, 2024 | Benchmarkingslot-filling | CodeCode Available | 1 |
| 3DYoga90: A Hierarchical Video Dataset for Yoga Pose Understanding | Oct 16, 2023 | Action RecognitionBenchmarking | CodeCode Available | 1 |
| Adversarial Prompt Evaluation: Systematic Benchmarking of Guardrails Against Prompt Input Attacks on LLMs | Feb 21, 2025 | Benchmarking | CodeCode Available | 1 |
| Benchmarking Fish Dataset and Evaluation Metric in Keypoint Detection -- Towards Precise Fish Morphological Assessment in Aquaculture Breeding | May 21, 2024 | BenchmarkingKeypoint Detection | CodeCode Available | 1 |
| Prompt Tuned Embedding Classification for Multi-Label Industry Sector Allocation | Sep 21, 2023 | BenchmarkingClassification | CodeCode Available | 1 |
| Multimodal Large Language Models for Inverse Molecular Design with Retrosynthetic Planning | Oct 5, 2024 | BenchmarkingDrug Design | CodeCode Available | 1 |
| Codabench: Flexible, Easy-to-Use and Reproducible Benchmarking Platform | Oct 12, 2021 | Benchmarking | CodeCode Available | 1 |
| MultiRes-NetVLAD: Augmenting Place Recognition Training with Low-Resolution Imagery | Feb 18, 2022 | BenchmarkingRepresentation Learning | CodeCode Available | 1 |
| CodeS: Natural Language to Code Repository via Multi-Layer Sketch | Mar 25, 2024 | Benchmarking | CodeCode Available | 1 |
| Benchmarking for Biomedical Natural Language Processing Tasks with a Domain Specific ALBERT | Jul 9, 2021 | BenchmarkingDocument Classification | CodeCode Available | 1 |
| MULTITuDE: Large-Scale Multilingual Machine-Generated Text Detection Benchmark | Oct 20, 2023 | Benchmarkingde-en | CodeCode Available | 1 |
| MuSe-GNN: Learning Unified Gene Representation From Multimodal Biological Graph Data | Sep 29, 2023 | BenchmarkingContrastive Learning | CodeCode Available | 1 |
| CommonPower: A Framework for Safe Data-Driven Smart Grid Control | Jun 5, 2024 | Benchmarkingenergy management | CodeCode Available | 1 |
| Working Memory Capacity of ChatGPT: An Empirical Study | Apr 30, 2023 | BenchmarkingLanguage Modeling | CodeCode Available | 1 |