| MTG: A Benchmarking Suite for Multilingual Text Generation | Oct 16, 2021 | BenchmarkingQuestion Generation | —Unverified | 0 |
| MTLens: Machine Translation Output Debugging | Jun 1, 2022 | BenchmarkingMachine Translation | —Unverified | 0 |
| MTOP: A Comprehensive Multilingual Task-Oriented Semantic Parsing Benchmark | Aug 21, 2020 | BenchmarkingSemantic Parsing | —Unverified | 0 |
| Muffin or Chihuahua? Challenging Multimodal Large Language Models with Multipanel VQA | Jan 29, 2024 | BenchmarkingImage Comprehension | —Unverified | 0 |
| Mukayese: Turkish NLP Strikes Back | Nov 16, 2021 | BenchmarkingLanguage Modeling | —Unverified | 0 |
| Multicalibration for Confidence Scoring in LLMs | Apr 6, 2024 | BenchmarkingQuestion Answering | —Unverified | 0 |
| Multi-Camera Action Dataset for Cross-Camera Action Recognition Benchmarking | Jul 21, 2016 | Action RecognitionBenchmarking | —Unverified | 0 |
| Multi-channel deep convolutional neural networks for multi-classifying thyroid disease | Mar 6, 2022 | BenchmarkingBinary Classification | —Unverified | 0 |
| Multiclass Optimal Classification Trees with SVM-splits | Nov 16, 2021 | BenchmarkingClassification | —Unverified | 0 |
| Multi-Dimensional Insights: Benchmarking Real-World Personalization in Large Multimodal Models | Dec 17, 2024 | Benchmarking | —Unverified | 0 |