| Benchmarking Histopathology Foundation Models for Ovarian Cancer Bevacizumab Treatment Response Prediction from Whole Slide Images | Jul 30, 2024 | BenchmarkingMultiple Instance Learning | —Unverified | 0 | 0 |
| Benchmarking high-fidelity pedestrian tracking systems for research, real-time monitoring and crowd control | Aug 26, 2021 | BenchmarkingDensity Estimation | —Unverified | 0 | 0 |
| What Emotions Make One or Five Stars? Understanding Ratings of Online Product Reviews by Sentiment Analysis and XAI | Feb 29, 2020 | BenchmarkingBIG-bench Machine Learning | —Unverified | 0 | 0 |
| Benchmarking Hierarchical Image Pyramid Transformer for the classification of colon biopsies and polyps in histopathology images | May 24, 2024 | BenchmarkingClassification | —Unverified | 0 | 0 |
| ADCB: An Alzheimer's disease benchmark for evaluating observational estimators of causal effects | Nov 12, 2021 | BenchmarkingCausal Inference | —Unverified | 0 | 0 |
| MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems | May 16, 2025 | BenchmarkingMixture-of-Experts | —Unverified | 0 | 0 |
| MIRAI: Evaluating LLM Agents for Event Forecasting | Jul 1, 2024 | ArticlesBenchmarking | —Unverified | 0 | 0 |
| MIR-Bench: Can Your LLM Recognize Complicated Patterns via Many-Shot In-Context Reasoning? | Feb 14, 2025 | BenchmarkingIn-Context Learning | —Unverified | 0 | 0 |
| Benchmarking Heterogeneous Treatment Effect Models through the Lens of Interpretability | Jun 16, 2022 | BenchmarkingFeature Importance | —Unverified | 0 | 0 |
| Towards Large Language Models that Benefit for All: Benchmarking Group Fairness in Reward Models | Mar 10, 2025 | AllBenchmarking | —Unverified | 0 | 0 |
| Benchmarking Hebbian learning rules for associative memory | Dec 30, 2023 | Benchmarking | —Unverified | 0 | 0 |
| Mitigating severe over-parameterization in deep convolutional neural networks through forced feature abstraction and compression with an entropy-based heuristic | Jun 27, 2021 | BenchmarkingFeature Compression | —Unverified | 0 | 0 |
| Mixed-Precision Quantization for Federated Learning on Resource-Constrained Heterogeneous Devices | Nov 29, 2023 | BenchmarkingFederated Learning | —Unverified | 0 | 0 |
| A Dataset Similarity Evaluation Framework for Wireless Communications and Sensing | Dec 7, 2024 | BenchmarkingDimensionality Reduction | —Unverified | 0 | 0 |
| Benchmarking Harmonized Tariff Schedule Classification Models | Dec 4, 2024 | BenchmarkingClassification | —Unverified | 0 | 0 |
| MJ-VIDEO: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation | Feb 3, 2025 | BenchmarkingFairness | —Unverified | 0 | 0 |
| Towards Large-Scale Small Object Detection: Survey and Benchmarks | Jul 28, 2022 | BenchmarkingObject | —Unverified | 0 | 0 |
| MLAR: Multi-layer Large Language Model-based Robotic Process Automation Applicant Tracking | Jul 14, 2025 | BenchmarkingLanguage Modeling | —Unverified | 0 | 0 |
| Towards Long-Term predictions of Turbulence using Neural Operators | Jul 25, 2023 | Benchmarking | —Unverified | 0 | 0 |
| Benchmarking Graph Neural Networks on Link Prediction | Feb 24, 2021 | BenchmarkingGraph Attention | —Unverified | 0 | 0 |
| MLHarness: A Scalable Benchmarking System for MLCommons | Nov 9, 2021 | Benchmarking | —Unverified | 0 | 0 |
| Benchmarking Graph Neural Networks for Document Layout Analysis in Public Affairs | May 12, 2025 | BenchmarkingDocument Layout Analysis | —Unverified | 0 | 0 |
| MLModelScope: A Distributed Platform for ML Model Evaluation and Benchmarking at Scale | Sep 25, 2019 | Benchmarking | —Unverified | 0 | 0 |
| MLModelScope: A Distributed Platform for Model Evaluation and Benchmarking at Scale | Feb 19, 2020 | Benchmarking | —Unverified | 0 | 0 |
| A Dataset for Movie Description | Jan 12, 2015 | BenchmarkingDescriptive | —Unverified | 0 | 0 |
| Benchmarking Graph Learning for Drug-Drug Interaction Prediction | Oct 24, 2024 | BenchmarkingGraph Learning | —Unverified | 0 | 0 |
| A Dataset for Developing and Benchmarking Active Vision | Feb 27, 2017 | BenchmarkingGeneral Classification | —Unverified | 0 | 0 |
| Benchmarking GPUs on SVBRDF Extractor Model | Oct 19, 2023 | BenchmarkingGPU | —Unverified | 0 | 0 |
| Towards More Robust NLP System Evaluation: Handling Missing Scores in Benchmarks | May 17, 2023 | Benchmarking | —Unverified | 0 | 0 |
| Benchmarking GPU and TPU Performance with Graph Neural Networks | Oct 21, 2022 | BenchmarkingGPU | —Unverified | 0 | 0 |
| MLPerf HPC: A Holistic Benchmark Suite for Scientific Machine Learning on HPC Systems | Oct 21, 2021 | BenchmarkingBIG-bench Machine Learning | —Unverified | 0 | 0 |
| What if we had no Wikipedia? Domain-independent Term Extraction from a Large News Corpus | Sep 17, 2020 | BenchmarkingTerm Extraction | —Unverified | 0 | 0 |
| mlr3proba: An R Package for Machine Learning in Survival Analysis | Aug 18, 2020 | BenchmarkingBIG-bench Machine Learning | —Unverified | 0 | 0 |
| ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets | Jun 12, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| Benchmarking GPT-4 on Algorithmic Problems: A Systematic Evaluation of Prompting Strategies | Feb 27, 2024 | BenchmarkingSystematic Generalization | —Unverified | 0 | 0 |
| Benchmarking GNNs Using Lightning Network Data | Jul 5, 2024 | Benchmarking | —Unverified | 0 | 0 |
| A dataset for benchmarking vision-based localization at intersections | Nov 4, 2018 | Benchmarking | —Unverified | 0 | 0 |
| Benchmarking global optimization techniques for unmanned aerial vehicle path planning | Jan 24, 2025 | Benchmarkingglobal-optimization | —Unverified | 0 | 0 |
| MMDocBench: Benchmarking Large Vision-Language Models for Fine-Grained Visual Document Understanding | Oct 25, 2024 | Benchmarkingdocument understanding | —Unverified | 0 | 0 |
| MMDocIR: Benchmarking Multi-Modal Retrieval for Long Documents | Jan 15, 2025 | BenchmarkingOptical Character Recognition (OCR) | —Unverified | 0 | 0 |
| Benchmarking Generative Models on Computational Thinking Tests in Elementary Visual Programming | Jun 14, 2024 | BenchmarkingGeneral Knowledge | —Unverified | 0 | 0 |
| MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency | Feb 13, 2025 | BenchmarkingMath | —Unverified | 0 | 0 |
| MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models | Apr 4, 2025 | BenchmarkingImage Generation | —Unverified | 0 | 0 |
| MMInA: Benchmarking Multihop Multimodal Internet Agents | Apr 15, 2024 | Benchmarking | —Unverified | 0 | 0 |
| Benchmarking Generative AI for Scoring Medical Student Interviews in Objective Structured Clinical Examinations (OSCEs) | Jan 21, 2025 | Benchmarking | —Unverified | 0 | 0 |
| Benchmarking General-Purpose In-Context Learning | May 27, 2024 | BenchmarkingDecision Making | —Unverified | 0 | 0 |
| MMMG: a Comprehensive and Reliable Evaluation Suite for Multitask Multimodal Generation | May 23, 2025 | Audio GenerationBenchmarking | —Unverified | 0 | 0 |
| MMMR: Benchmarking Massive Multi-Modal Reasoning Tasks | May 22, 2025 | BenchmarkingSpatial Reasoning | —Unverified | 0 | 0 |
| MMSciBench: Benchmarking Language Models on Multimodal Scientific Problems | Feb 27, 2025 | BenchmarkingVisual Reasoning | —Unverified | 0 | 0 |
| MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines | Sep 19, 2024 | Benchmarking | —Unverified | 0 | 0 |