| Measuring CLEVRness: Blackbox testing of Visual Reasoning Models | Feb 24, 2022 | BenchmarkingDiagnostic | —Unverified | 0 | 0 |
| Measuring Large Language Models Capacity to Annotate Journalistic Sourcing | Dec 30, 2024 | BenchmarkingEthics | —Unverified | 0 | 0 |
| Measuring the Complexity of Domains Used to Evaluate AI Systems | Sep 18, 2020 | Benchmarking | —Unverified | 0 | 0 |
| Measuring the Effect of Causal Disentanglement on the Adversarial Robustness of Neural Network Models | Aug 21, 2023 | Adversarial RobustnessBenchmarking | —Unverified | 0 | 0 |
| Towards Effective Disambiguation for Machine Translation with Large Language Models | Sep 20, 2023 | BenchmarkingIn-Context Learning | —Unverified | 0 | 0 |
| MEBench: Benchmarking Large Language Models for Cross-Document Multi-Entity Question Answering | Feb 26, 2025 | BenchmarkingQuestion Answering | —Unverified | 0 | 0 |
| MechProNet: Machine Learning Prediction of Mechanical Properties in Metal Additive Manufacturing | Aug 21, 2022 | ArticlesBenchmarking | —Unverified | 0 | 0 |
| Mechanistic Understanding and Mitigation of Language Confusion in English-Centric Large Language Models | May 22, 2025 | BenchmarkingLanguage Modeling | —Unverified | 0 | 0 |
| Benchmarking Large Language Models on Homework Assessment in Circuit Analysis | Jun 5, 2025 | Benchmarking | —Unverified | 0 | 0 |
| Benchmarking Large Language Models in Complex Question Answering Attribution using Knowledge Graphs | Jan 26, 2024 | BenchmarkingKnowledge Graphs | —Unverified | 0 | 0 |