| Microvasculature Segmentation in Human BioMolecular Atlas Program (HuBMAP) | Aug 6, 2023 | BenchmarkingImage Segmentation | —Unverified | 0 |
| MileBench: Benchmarking MLLMs in Long Context | Apr 29, 2024 | BenchmarkingDiagnostic | —Unverified | 0 |
| MiLQ: Benchmarking IR Models for Bilingual Web Search with Mixed Language Queries | May 22, 2025 | BenchmarkingInformation Retrieval | —Unverified | 0 |
| Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge | Jun 26, 2025 | Benchmarking | —Unverified | 0 |
| Are Machines Better at Complex Reasoning? Unveiling Human-Machine Inference Gaps in Entailment Verification | Feb 6, 2024 | BenchmarkingMultiple-choice | —Unverified | 0 |
| Mind the Retrosynthesis Gap: Bridging the divide between Single-step and Multi-step Retrosynthesis Prediction | Dec 12, 2022 | BenchmarkingMulti-step retrosynthesis | —Unverified | 0 |
| Mind Your Theory: Theory of Mind Goes Deeper Than Reasoning | Dec 18, 2024 | BenchmarkingPosition | —Unverified | 0 |
| MIRAI: Evaluating LLM Agents for Event Forecasting | Jul 1, 2024 | ArticlesBenchmarking | —Unverified | 0 |
| MIR-Bench: Can Your LLM Recognize Complicated Patterns via Many-Shot In-Context Reasoning? | Feb 14, 2025 | BenchmarkingIn-Context Learning | —Unverified | 0 |
| Mitigating severe over-parameterization in deep convolutional neural networks through forced feature abstraction and compression with an entropy-based heuristic | Jun 27, 2021 | BenchmarkingFeature Compression | —Unverified | 0 |
| Mixed-Precision Quantization for Federated Learning on Resource-Constrained Heterogeneous Devices | Nov 29, 2023 | BenchmarkingFederated Learning | —Unverified | 0 |
| MJ-VIDEO: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation | Feb 3, 2025 | BenchmarkingFairness | —Unverified | 0 |
| MLAR: Multi-layer Large Language Model-based Robotic Process Automation Applicant Tracking | Jul 14, 2025 | BenchmarkingLanguage Modeling | —Unverified | 0 |
| MLHarness: A Scalable Benchmarking System for MLCommons | Nov 9, 2021 | Benchmarking | —Unverified | 0 |
| MLModelScope: A Distributed Platform for ML Model Evaluation and Benchmarking at Scale | Sep 25, 2019 | Benchmarking | —Unverified | 0 |
| MLModelScope: A Distributed Platform for Model Evaluation and Benchmarking at Scale | Feb 19, 2020 | Benchmarking | —Unverified | 0 |
| MLPerf HPC: A Holistic Benchmark Suite for Scientific Machine Learning on HPC Systems | Oct 21, 2021 | BenchmarkingBIG-bench Machine Learning | —Unverified | 0 |
| mlr3proba: An R Package for Machine Learning in Survival Analysis | Aug 18, 2020 | BenchmarkingBIG-bench Machine Learning | —Unverified | 0 |
| ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets | Jun 12, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| MMDocBench: Benchmarking Large Vision-Language Models for Fine-Grained Visual Document Understanding | Oct 25, 2024 | Benchmarkingdocument understanding | —Unverified | 0 |
| MMDocIR: Benchmarking Multi-Modal Retrieval for Long Documents | Jan 15, 2025 | BenchmarkingOptical Character Recognition (OCR) | —Unverified | 0 |
| MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency | Feb 13, 2025 | BenchmarkingMath | —Unverified | 0 |
| MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models | Apr 4, 2025 | BenchmarkingImage Generation | —Unverified | 0 |
| MMInA: Benchmarking Multihop Multimodal Internet Agents | Apr 15, 2024 | Benchmarking | —Unverified | 0 |
| MMMG: a Comprehensive and Reliable Evaluation Suite for Multitask Multimodal Generation | May 23, 2025 | Audio GenerationBenchmarking | —Unverified | 0 |