| 2017 Robotic Instrument Segmentation Challenge | Feb 18, 2019 | BenchmarkingPerson Re-Identification | CodeCode Available | 0 |
| AI Fairness 360: An Extensible Toolkit for Detecting, Understanding, and Mitigating Unwanted Algorithmic Bias | Oct 3, 2018 | BenchmarkingDecision Making | CodeCode Available | 0 |
| Benchmarking Intersectional Biases in NLP | Jul 1, 2022 | BenchmarkingBIG-bench Machine Learning | CodeCode Available | 0 |
| Benchmarking Commercial Intent Detection Services with Practice-Driven Evaluations | Dec 7, 2020 | BenchmarkingGoal-Oriented Dialog | CodeCode Available | 0 |
| Towards Fair and Privacy-Preserving Federated Deep Models | Jun 4, 2019 | BenchmarkingDeep Learning | CodeCode Available | 0 |
| SPDEBench: An Extensive Benchmark for Learning Regular and Singular Stochastic PDEs | May 24, 2025 | Benchmarking | CodeCode Available | 0 |
| Deep Neural Network Benchmarks for Selective Classification | Jan 23, 2024 | BenchmarkingClassification | CodeCode Available | 0 |
| Abstraction Alignment: Comparing Model-Learned and Human-Encoded Conceptual Relationships | Jul 17, 2024 | Benchmarking | CodeCode Available | 0 |
| Arabic Speech Recognition by End-to-End, Modular Systems and Human | Jan 21, 2021 | Arabic Speech RecognitionAutomatic Speech Recognition | CodeCode Available | 0 |
| Benchmarking Image Perturbations for Testing Automated Driving Assistance Systems | Jan 21, 2025 | Autonomous VehiclesBenchmarking | CodeCode Available | 0 |
| Deep Metric Learning Meets Deep Clustering: An Novel Unsupervised Approach for Feature Embedding | Sep 9, 2020 | BenchmarkingClustering | CodeCode Available | 0 |
| Deepened Graph Auto-Encoders Help Stabilize and Enhance Link Prediction | Mar 21, 2021 | BenchmarkingClustering | CodeCode Available | 0 |
| Oral Imaging for Malocclusion Issues Assessments: OMNI Dataset, Deep Learning Baselines and Benchmarking | May 21, 2025 | BenchmarkingDiagnostic | CodeCode Available | 0 |
| Orchestrator-Agent Trust: A Modular Agentic AI Visual Classification System with Trust-Aware Orchestration and RAG-Based Reasoning | Jul 9, 2025 | BenchmarkingImage Retrieval | CodeCode Available | 0 |
| ORCHID: A Chinese Debate Corpus for Target-Independent Stance Detection and Argumentative Dialogue Summarization | Oct 17, 2024 | BenchmarkingStance Detection | CodeCode Available | 0 |
| Benchmarking Human and Automated Prompting in the Segment Anything Model | Oct 29, 2024 | BenchmarkingImage Segmentation | CodeCode Available | 0 |
| Speech Self-Supervised Representation Benchmarking: Are We Doing it Right? | Jun 1, 2023 | BenchmarkingDecoder | CodeCode Available | 0 |
| Deep Emotion Recognition in Textual Conversations: A Survey | Nov 16, 2022 | BenchmarkingEmotion Recognition | CodeCode Available | 0 |
| Neural Style Transfer Improves 3D Cardiovascular MR Image Segmentation on Inconsistent Data | Sep 20, 2019 | BenchmarkingEnsemble Learning | CodeCode Available | 0 |
| OSS-Bench: Benchmark Generator for Coding LLMs | May 18, 2025 | Benchmarking | CodeCode Available | 0 |
| DeepDrug3D: Classification of ligand-binding pockets in proteins with a convolutional neural network | Feb 4, 2019 | BenchmarkingSpecificity | CodeCode Available | 0 |
| deepCR: Cosmic Ray Rejection with Deep Learning | Jul 22, 2019 | BenchmarkingCPU | CodeCode Available | 0 |
| A quantum-classical reinforcement learning model to play Atari games | Dec 11, 2024 | Atari GamesBenchmarking | CodeCode Available | 0 |
| Towards Ground-truth-free Evaluation of Any Segmentation in Medical Images | Sep 23, 2024 | BenchmarkingSegmentation | CodeCode Available | 0 |
| Deep Attention Driven Reinforcement Learning (DAD-RL) for Autonomous Decision-Making in Dynamic Environment | Jul 12, 2024 | BenchmarkingDecision Making | CodeCode Available | 0 |
| Out of Distribution Detection on ImageNet-O | Jan 23, 2022 | BenchmarkingOut-of-Distribution Detection | CodeCode Available | 0 |
| Benchmarking histopathology foundation models in a multi-center dataset for skin cancer subtyping | Jun 23, 2025 | BenchmarkingDiversity | CodeCode Available | 0 |
| Deep Affinity Network for Multiple Object Tracking | Oct 28, 2018 | BenchmarkingMultiple Object Tracking | CodeCode Available | 0 |
| Benchmarking HillVallEA for the GECCO 2019 Competition on Multimodal Optimization | Jul 25, 2019 | Benchmarking | CodeCode Available | 0 |
| Benchmarking Hierarchical Script Knowledge | Jun 1, 2019 | Benchmarking | CodeCode Available | 0 |
| Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training Benchmark | Feb 14, 2022 | BenchmarkingContrastive Learning | CodeCode Available | 0 |
| Deciphering the Underserved: Benchmarking LLM OCR for Low-Resource Scripts | Dec 20, 2024 | BenchmarkingOptical Character Recognition | CodeCode Available | 0 |
| Towards IID representation learning and its application on biomedical data | Mar 1, 2022 | BenchmarkingRepresentation Learning | CodeCode Available | 0 |
| A projected nonlinear state-space model for forecasting time series signals | Nov 22, 2023 | BenchmarkingComputational Efficiency | CodeCode Available | 0 |
| Debatable Intelligence: Benchmarking LLM Judges via Debate Speech Evaluation | Jun 5, 2025 | Benchmarking | CodeCode Available | 0 |
| Benchmarking Hallucination in Large Language Models based on Unanswerable Math Word Problem | Mar 6, 2024 | BenchmarkingHallucination | CodeCode Available | 0 |
| Dealing with missing data using attention and latent space regularization | Nov 14, 2022 | BenchmarkingImputation | CodeCode Available | 0 |
| DCR: Quantifying Data Contamination in LLMs Evaluation | Jul 15, 2025 | Arithmetic ReasoningBenchmarking | CodeCode Available | 0 |
| DateLogicQA: Benchmarking Temporal Biases in Large Language Models | Dec 17, 2024 | Benchmarking | CodeCode Available | 0 |
| Towards Intersectionality in Machine Learning: Including More Identities, Handling Underrepresentation, and Performing Evaluation | May 10, 2022 | AttributeBenchmarking | CodeCode Available | 0 |
| A Biologically Plausible Benchmark for Contextual Bandit Algorithms in Precision Oncology Using in vitro Data | Nov 11, 2019 | BenchmarkingDecision Making | CodeCode Available | 0 |
| Data-Efficient Training of CNNs and Transformers with Coresets: A Stability Perspective | Mar 3, 2023 | BenchmarkingImage Classification | CodeCode Available | 0 |
| Parameterized Argumentation-based Reasoning Tasks for Benchmarking Generative Language Models | May 2, 2025 | Benchmarking | CodeCode Available | 0 |
| PARAPHRASUS : A Comprehensive Benchmark for Evaluating Paraphrase Detection Models | Sep 18, 2024 | BenchmarkingModel Selection | CodeCode Available | 0 |
| CVPR 2020 Continual Learning in Computer Vision Competition: Approaches, Results, Current Challenges and Future Directions | Sep 14, 2020 | BenchmarkingContinual Learning | CodeCode Available | 0 |
| CVM-Net: Cross-View Matching Network for Image-Based Ground-to-Aerial Geo-Localization | Jun 1, 2018 | Benchmarkinggeo-localization | CodeCode Available | 0 |
| SpokeN-100: A Cross-Lingual Benchmarking Dataset for The Classification of Spoken Numbers in Different Languages | Mar 14, 2024 | BenchmarkingDimensionality Reduction | CodeCode Available | 0 |
| Partial Rankings of Optimizers | Feb 26, 2024 | Benchmarking | CodeCode Available | 0 |
| A predictive analytics approach for stroke prediction using machine learning and neural networks | Mar 1, 2022 | BenchmarkingBIG-bench Machine Learning | CodeCode Available | 0 |
| Ab Initio Nonparametric Variable Selection for Scalable Symbolic Regression with Large p | Oct 17, 2024 | Benchmarkingregression | CodeCode Available | 0 |