| MedKP: Medical Dialogue with Knowledge Enhancement and Clinical Pathway Encoding | Mar 11, 2024 | Dialogue GenerationMultiple-choice | —Unverified | 0 | 0 |
| A Method for Building a Commonsense Inference Dataset based on Basic Events | Nov 1, 2020 | Multiple-choiceTransfer Learning | —Unverified | 0 | 0 |
| Unveiling Cultural Blind Spots: Analyzing the Limitations of mLLMs in Procedural Text Comprehension | Feb 20, 2025 | Multiple-choiceReading Comprehension | —Unverified | 0 | 0 |
| Med-RLVR: Emerging Medical Reasoning from a 3B base model via reinforcement Learning | Feb 27, 2025 | MathMedical Question Answering | —Unverified | 0 | 0 |
| AmazUtah_NLP at SemEval-2024 Task 9: A MultiChoice Question Answering System for Commonsense Defying Reasoning | May 16, 2024 | Multiple-choiceQuestion Answering | —Unverified | 0 | 0 |
| UrbanVideo-Bench: Benchmarking Vision-Language Models on Embodied Intelligence with Video Data in Urban Spaces | Mar 8, 2025 | Benchmarkingcounterfactual | —Unverified | 0 | 0 |
| AlignMMBench: Evaluating Chinese Multimodal Alignment in Large Vision-Language Models | Jun 13, 2024 | Multiple-choice | —Unverified | 0 | 0 |
| Meta Sequence Learning for Generating Adequate Question-Answer Pairs | Oct 4, 2020 | Multiple-choicenamed-entity-recognition | —Unverified | 0 | 0 |
| MHQA: A Diverse, Knowledge Intensive Mental Health Question Answering Challenge for Language Models | Feb 21, 2025 | BenchmarkingDiagnostic | —Unverified | 0 | 0 |
| MIBench: Evaluating Multimodal Large Language Models over Multiple Images | Jul 21, 2024 | In-Context LearningMultiple-choice | —Unverified | 0 | 0 |
| Use neural networks to recognize students' handwritten letters and incorrect symbols | Sep 12, 2023 | Multiple-choice | —Unverified | 0 | 0 |
| Using contradictions improves question answering systems | Sep 28, 2022 | Multiple-choiceNatural Language Inference | —Unverified | 0 | 0 |
| Using Large Language Models for Automated Grading of Student Writing about Science | Dec 25, 2024 | AstronomyMultiple-choice | —Unverified | 0 | 0 |
| Are Machines Better at Complex Reasoning? Unveiling Human-Machine Inference Gaps in Entailment Verification | Feb 6, 2024 | BenchmarkingMultiple-choice | —Unverified | 0 | 0 |
| MINI-LLM: Memory-Efficient Structured Pruning for Large Language Models | Jul 16, 2024 | GPUMultiple-choice | —Unverified | 0 | 0 |
| Mitigating Bias for Question Answering Models by Tracking Bias Influence | Oct 13, 2023 | Multiple-choiceMulti-Task Learning | —Unverified | 0 | 0 |
| Mitigating Selection Bias with Node Pruning and Auxiliary Options | Sep 27, 2024 | Multiple-choiceSelection bias | —Unverified | 0 | 0 |
| MixQG: Neural Question Generation with Mixed Answer Types | Jan 16, 2022 | Multiple-choiceQuestion Answering | —Unverified | 0 | 0 |
| ZeroTuning: Unlocking the Initial Token's Power to Enhance Large Language Models Without Training | May 16, 2025 | Multiple-choicetext-classification | —Unverified | 0 | 0 |
| A Comparative Study of AI-Generated (GPT-4) and Human-crafted MCQs in Programming Education | Dec 5, 2023 | Multiple-choice | —Unverified | 0 | 0 |
| A Joint-Reasoning based Disease Q&A System | Jan 6, 2024 | Knowledge GraphsMisinformation | —Unverified | 0 | 0 |
| AI-based Arabic Language and Speech Tutor | Oct 22, 2022 | Multiple-choiceSelf-Learning | —Unverified | 0 | 0 |
| MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence | May 29, 2025 | Multiple-choiceSpatial Reasoning | —Unverified | 0 | 0 |
| VCEval: Rethinking What is a Good Educational Video and How to Automatically Evaluate It | Jun 15, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Modeling of Item-Difficulty for Ontology-based MCQs | Jul 4, 2016 | Multiple-choice | —Unverified | 0 | 0 |