| Out of Distribution Detection on ImageNet-O | Jan 23, 2022 | BenchmarkingOut-of-Distribution Detection | CodeCode Available | 0 |
| Benchmarking histopathology foundation models in a multi-center dataset for skin cancer subtyping | Jun 23, 2025 | BenchmarkingDiversity | CodeCode Available | 0 |
| Deep Affinity Network for Multiple Object Tracking | Oct 28, 2018 | BenchmarkingMultiple Object Tracking | CodeCode Available | 0 |
| Benchmarking HillVallEA for the GECCO 2019 Competition on Multimodal Optimization | Jul 25, 2019 | Benchmarking | CodeCode Available | 0 |
| Benchmarking Hierarchical Script Knowledge | Jun 1, 2019 | Benchmarking | CodeCode Available | 0 |
| Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training Benchmark | Feb 14, 2022 | BenchmarkingContrastive Learning | CodeCode Available | 0 |
| Deciphering the Underserved: Benchmarking LLM OCR for Low-Resource Scripts | Dec 20, 2024 | BenchmarkingOptical Character Recognition | CodeCode Available | 0 |
| Towards IID representation learning and its application on biomedical data | Mar 1, 2022 | BenchmarkingRepresentation Learning | CodeCode Available | 0 |
| A projected nonlinear state-space model for forecasting time series signals | Nov 22, 2023 | BenchmarkingComputational Efficiency | CodeCode Available | 0 |
| Debatable Intelligence: Benchmarking LLM Judges via Debate Speech Evaluation | Jun 5, 2025 | Benchmarking | CodeCode Available | 0 |
| Benchmarking Hallucination in Large Language Models based on Unanswerable Math Word Problem | Mar 6, 2024 | BenchmarkingHallucination | CodeCode Available | 0 |
| Dealing with missing data using attention and latent space regularization | Nov 14, 2022 | BenchmarkingImputation | CodeCode Available | 0 |
| DCR: Quantifying Data Contamination in LLMs Evaluation | Jul 15, 2025 | Arithmetic ReasoningBenchmarking | CodeCode Available | 0 |
| DateLogicQA: Benchmarking Temporal Biases in Large Language Models | Dec 17, 2024 | Benchmarking | CodeCode Available | 0 |
| Towards Intersectionality in Machine Learning: Including More Identities, Handling Underrepresentation, and Performing Evaluation | May 10, 2022 | AttributeBenchmarking | CodeCode Available | 0 |
| A Biologically Plausible Benchmark for Contextual Bandit Algorithms in Precision Oncology Using in vitro Data | Nov 11, 2019 | BenchmarkingDecision Making | CodeCode Available | 0 |
| Data-Efficient Training of CNNs and Transformers with Coresets: A Stability Perspective | Mar 3, 2023 | BenchmarkingImage Classification | CodeCode Available | 0 |
| Parameterized Argumentation-based Reasoning Tasks for Benchmarking Generative Language Models | May 2, 2025 | Benchmarking | CodeCode Available | 0 |
| PARAPHRASUS : A Comprehensive Benchmark for Evaluating Paraphrase Detection Models | Sep 18, 2024 | BenchmarkingModel Selection | CodeCode Available | 0 |
| CVPR 2020 Continual Learning in Computer Vision Competition: Approaches, Results, Current Challenges and Future Directions | Sep 14, 2020 | BenchmarkingContinual Learning | CodeCode Available | 0 |
| CVM-Net: Cross-View Matching Network for Image-Based Ground-to-Aerial Geo-Localization | Jun 1, 2018 | Benchmarkinggeo-localization | CodeCode Available | 0 |
| SpokeN-100: A Cross-Lingual Benchmarking Dataset for The Classification of Spoken Numbers in Different Languages | Mar 14, 2024 | BenchmarkingDimensionality Reduction | CodeCode Available | 0 |
| Partial Rankings of Optimizers | Feb 26, 2024 | Benchmarking | CodeCode Available | 0 |
| A predictive analytics approach for stroke prediction using machine learning and neural networks | Mar 1, 2022 | BenchmarkingBIG-bench Machine Learning | CodeCode Available | 0 |
| Ab Initio Nonparametric Variable Selection for Scalable Symbolic Regression with Large p | Oct 17, 2024 | Benchmarkingregression | CodeCode Available | 0 |