| Large Language Models Sensitivity to The Order of Options in Multiple-Choice Questions | Aug 22, 2023 | Multiple-choiceSensitivity | —Unverified | 0 |
| Large Language Models Still Exhibit Bias in Long Text | Oct 23, 2024 | FairnessMultiple-choice | —Unverified | 0 |
| A Comparative Study of Open-Source Large Language Models, GPT-4 and Claude 2: Multiple-Choice Test Taking in Nephrology | Aug 9, 2023 | Multiple-choice | —Unverified | 0 |
| Understanding Prior Bias and Choice Paralysis in Transformer-based Language Representation Models through Four Experimental Probes | Oct 3, 2022 | Decision MakingMultiple-choice | —Unverified | 0 |
| Learning a Word-Level Language Model with Sentence-Level Noise Contrastive Estimation for Contextual Sentence Probability Estimation | Mar 14, 2021 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Learning Language-Visual Embedding for Movie Understanding with Natural-Language | Sep 26, 2016 | Multiple-choiceRetrieval | —Unverified | 0 |
| Learning Models for Actions and Person-Object Interactions with Transfer to Question Answering | Apr 16, 2016 | General ClassificationHuman-Object Interaction Detection | —Unverified | 0 |
| Learning to Specialize with Knowledge Distillation for Visual Question Answering | Dec 1, 2018 | General ClassificationGeneral Knowledge | —Unverified | 0 |
| An AI-based Solution for Enhancing Delivery of Digital Learning for Future Teachers | Nov 9, 2021 | Multiple-choiceQuestion Generation | —Unverified | 0 |
| LegalBench.PT: A Benchmark for Portuguese Law | Feb 22, 2025 | Multiple-choice | —Unverified | 0 |
| Teaching Pretrained Models with Commonsense Reasoning: A Preliminary KB-Based Approach | Sep 20, 2019 | Few-Shot LearningLogical Reasoning | —Unverified | 0 |
| WIQA: A dataset for ``What if...'' reasoning over procedural text | Nov 1, 2019 | Multiple-choice | —Unverified | 0 |
| LEXam: Benchmarking Legal Reasoning on 340 Law Exams | May 19, 2025 | BenchmarkingLegal Reasoning | —Unverified | 0 |
| LHMKE: A Large-scale Holistic Multi-subject Knowledge Evaluation Benchmark for Chinese Large Language Models | Mar 19, 2024 | Multiple-choice | —Unverified | 0 |
| WirelessMathBench: A Mathematical Modeling Benchmark for LLMs in Wireless Communications | May 20, 2025 | Mathematical ReasoningMultiple-choice | —Unverified | 0 |
| Linguistic Legal Concept Extraction in Portuguese | Oct 22, 2018 | EthicsMultiple-choice | —Unverified | 0 |
| Listening to the Wise Few: Select-and-Copy Attention Heads for Multiple-Choice QA | Oct 3, 2024 | Multiple-choiceQuestion Answering | —Unverified | 0 |
| LLaMa-SciQ: An Educational Chatbot for Answering Science MCQ | Sep 25, 2024 | ChatbotGSM8K | —Unverified | 0 |
| LLM-as-a-Judge & Reward Model: What They Can and Cannot Do | Sep 17, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| LLM-based Text Simplification and its Effect on User Comprehension and Cognitive Load | May 4, 2025 | ArticlesMultiple-choice | —Unverified | 0 |
| LLM Distillation for Efficient Few-Shot Multiple Choice Question Answering | Dec 13, 2024 | Few-Shot LearningKnowledge Distillation | —Unverified | 0 |
| Unlearning vs. Obfuscation: Are We Truly Removing Knowledge? | May 5, 2025 | Multiple-choice | —Unverified | 0 |
| LLM Evaluation Based on Aerospace Manufacturing Expertise: Automated Generation and Multi-Model Question Answering | Jan 25, 2025 | Information RetrievalMultiple-choice | —Unverified | 0 |
| Unleashing the Potential of Large Language Model: Zero-shot VQA for Flood Disaster Scenario | Dec 4, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 |
| LLMs to Support a Domain Specific Knowledge Assistant | Feb 6, 2025 | ChatbotMultiple-choice | —Unverified | 0 |