| Predicting the Difficulty of Multiple Choice Questions in a High-stakes Medical Exam | Aug 1, 2019 | Multiple-choiceQuestion Answering | —Unverified | 0 | 0 |
| Predictions from language models for multiple-choice tasks are not robust under variation of scoring methods | Mar 1, 2024 | Multiple-choice | —Unverified | 0 | 0 |
| Probabilistic Consensus through Ensemble Validation: A Framework for LLM Reliability | Nov 10, 2024 | Multiple-choiceText Generation | —Unverified | 0 | 0 |
| Prompt Engineering and Calibration for Zero-Shot Commonsense Reasoning | Apr 14, 2023 | Multiple-choicePrompt Engineering | —Unverified | 0 | 0 |
| Prompting Implicit Discourse Relation Annotation | Feb 7, 2024 | ClassificationImplicit Discourse Relation Classification | —Unverified | 0 | 0 |
| Instruction Fine-Tuning: Does Prompt Loss Matter? | Jan 24, 2024 | Multiple-choicetoken-classification | —Unverified | 0 | 0 |
| ProverbEval: Exploring LLM Evaluation Challenges for Low-resource Language Understanding | Nov 7, 2024 | BenchmarkingMultiple-choice | —Unverified | 0 | 0 |
| ConceptPsy:A Benchmark Suite with Conceptual Comprehensiveness in Psychology | Nov 16, 2023 | MMLUMultiple-choice | —Unverified | 0 | 0 |
| PUB: A Pragmatics Understanding Benchmark for Assessing LLMs' Pragmatics Capabilities | Jan 13, 2024 | Instruction FollowingMultiple-choice | —Unverified | 0 | 0 |
| Q-Bench-Video: Benchmarking the Video Quality Understanding of LMMs | Sep 30, 2024 | BenchmarkingMultiple-choice | —Unverified | 0 | 0 |
| Q-Bench-Video: Benchmark the Video Quality Understanding of LMMs | Jan 1, 2025 | Multiple-choiceVideo Generation | —Unverified | 0 | 0 |
| QOG:Question and Options Generation based on Language Model | Jun 18, 2024 | Information RetrievalLanguage Modeling | —Unverified | 0 | 0 |
| QRMeM: Unleash the Length Limitation through Question then Reflection Memory Mechanism | Jun 19, 2024 | Multiple-choiceQuestion Answering | —Unverified | 0 | 0 |
| VisNumBench: Evaluating Number Sense of Multimodal Large Language Models | Mar 19, 2025 | Multiple-choice | —Unverified | 0 | 0 |
| Query Rewriting for Retrieval-Augmented Large Language Models | May 23, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Question Difficulty Ranking for Multiple-Choice Reading Comprehension | Apr 16, 2024 | Multiple-choiceReading Comprehension | —Unverified | 0 | 0 |
| Question-type Identification for Academic Questions in Online Learning Platform | Nov 24, 2022 | Binary ClassificationMultiple-choice | —Unverified | 0 | 0 |
| Visual7W: Grounded Question Answering in Images | Nov 11, 2015 | Multiple-choiceMultiple Choice Question Answering (MCQA) | —Unverified | 0 | 0 |
| Ranking Facts for Explaining Answers to Elementary Science Questions | Oct 18, 2021 | Interpretable Machine LearningLearning-To-Rank | —Unverified | 0 | 0 |
| Ranking Large Language Models without Ground Truth | Feb 21, 2024 | Multiple-choiceTriplet | —Unverified | 0 | 0 |
| Read, Retrospect, Select: An MRC Framework to Short Text Entity Linking | Jan 7, 2021 | Entity LinkingMachine Reading Comprehension | —Unverified | 0 | 0 |
| RECAP-KG: Mining Knowledge Graphs from Raw GP Notes for Remote COVID-19 Assessment in Primary Care | Jun 17, 2023 | Decision Makinggraph construction | —Unverified | 0 | 0 |
| Receptivity of an AI Cognitive Assistant by the Radiology Community: A Report on Data Collected at RSNA | Sep 13, 2020 | Multiple-choiceQuestion Answering | —Unverified | 0 | 0 |
| Recurrent and Contextual Models for Visual Question Answering | Mar 23, 2017 | DiversityMultiple-choice | —Unverified | 0 | 0 |
| Visual Madlibs: Fill in the Blank Description Generation and Question Answering | Dec 1, 2015 | Multiple-choiceQuestion Answering | —Unverified | 0 | 0 |