| LMVE at SemEval-2020 Task 4: Commonsense Validation and Explanation using Pretraining Language Model | Jul 6, 2020 | Common Sense ReasoningLanguage Modeling | —Unverified | 0 |
| Localizing AI: Evaluating Open-Weight Language Models for Languages of Baltic States | Jan 7, 2025 | Machine TranslationMultiple-choice | —Unverified | 0 |
| Unlocking Video-LLM via Agent-of-Thoughts Distillation | Dec 2, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Unmasking Deceptive Visuals: Benchmarking Multimodal Large Language Models on Misleading Chart Question Answering | Mar 23, 2025 | BenchmarkingChart Question Answering | —Unverified | 0 |
| LogiDynamics: Unraveling the Dynamics of Logical Inference in Large Language Model Reasoning | Feb 16, 2025 | Analogical questionsIn-Context Learning | —Unverified | 0 |
| LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models | Oct 13, 2024 | Multiple-choice | —Unverified | 0 |
| An Add-On for Empowering Google Forms to be an Automatic Question Generator in Online Assessments | Sep 21, 2021 | Multiple-choice | —Unverified | 0 |
| Unsupervised Explanation Generation for Machine Reading Comprehension | Nov 13, 2020 | Explanation GenerationMachine Reading Comprehension | —Unverified | 0 |
| Unsupervised multiple-choice question generation for out-of-domain Q\&A fine-tuning | Nov 16, 2021 | Multiple-choiceQuestion Answering | —Unverified | 0 |
| LongPerceptualThoughts: Distilling System-2 Reasoning for System-1 Perception | Apr 21, 2025 | MathMMLU | —Unverified | 0 |
| LongReason: A Synthetic Long-Context Reasoning Benchmark via Context Expansion | Jan 25, 2025 | Multiple-choiceReading Comprehension | —Unverified | 0 |
| LookAlike: Consistent Distractor Generation in Math MCQs | May 3, 2025 | Distractor GenerationMath | —Unverified | 0 |
| Looking Beyond Sentence-Level Natural Language Inference for Question Answering and Text Summarization | Jun 1, 2021 | Multiple-choiceNatural Language Inference | —Unverified | 0 |
| Looking Beyond Short-Premise Natural Language Inference for Downstream Tasks | Dec 4, 2020 | Multiple-choiceNatural Language Inference | —Unverified | 0 |
| Unsupervised multiple-choice question generation for out-of-domain Q&A fine-tuning | May 1, 2022 | Multiple-choiceQuestion Answering | —Unverified | 0 |
| Make a Choice! Knowledge Base Question Answering with In-Context Learning | May 23, 2023 | In-Context LearningKnowledge Base Question Answering | —Unverified | 0 |
| Amobee at SemEval-2019 Tasks 5 and 6: Multiple Choice CNN Over Contextual Embedding | Apr 17, 2019 | Multiple-choice | —Unverified | 0 |
| MANTA: A Large-Scale Multi-View and Visual-Text Anomaly Detection Dataset for Tiny Objects | Dec 6, 2024 | 2kAnomaly Detection | —Unverified | 0 |
| Unsupervised multiple choices question answering via universal corpus | Feb 27, 2024 | FormKnowledge Graphs | —Unverified | 0 |
| MateInfoUB: A Real-World Benchmark for Testing LLMs in Competitive, Multilingual, and Multimodal Educational Tasks | Jul 3, 2025 | FairnessMultiple-choice | —Unverified | 0 |
| MaterialBENCH: Evaluating College-Level Materials Science Problem-Solving Abilities of Large Language Models | Sep 5, 2024 | Multiple-choice | —Unverified | 0 |
| Math Multiple Choice Question Generation via Human-Large Language Model Collaboration | May 1, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| MCL-GAN: Generative Adversarial Networks with Multiple Specialized Discriminators | Jul 15, 2021 | Generative Adversarial NetworkMultiple-choice | —Unverified | 0 |
| MCQA-Eval: Efficient Confidence Evaluation in NLG with Gold-Standard Correctness Labels | Feb 20, 2025 | Multiple-choiceText Generation | —Unverified | 0 |
| MCS-SQL: Leveraging Multiple Prompts and Multiple-Choice Selection For Text-to-SQL Generation | May 13, 2024 | In-Context LearningMultiple-choice | —Unverified | 0 |