| A Novel Multi-Stage Prompting Approach for Language Agnostic MCQ Generation using GPT | Jan 13, 2024 | Distractor GenerationMultiple-choice | CodeCode Available | 0 |
| Can Large Language Models Provide Security & Privacy Advice? Measuring the Ability of LLMs to Refute Misconceptions | Oct 3, 2023 | MisconceptionsMultiple-choice | CodeCode Available | 0 |
| DefAn: Definitive Answer Dataset for LLMs Hallucination Evaluation | Jun 13, 2024 | BenchmarkingHallucination | CodeCode Available | 0 |
| ToMChallenges: A Principle-Guided Dataset and Diverse Evaluation Tasks for Exploring Theory of Mind | May 24, 2023 | Multiple-choiceQuestion Answering | CodeCode Available | 0 |
| CLOMO: Counterfactual Logical Modification with Large Language Models | Nov 29, 2023 | counterfactualCounterfactual Reasoning | CodeCode Available | 0 |
| IdentifyMe: A Challenging Long-Context Mention Resolution Benchmark for LLMs | Nov 12, 2024 | coreference-resolutionCoreference Resolution | CodeCode Available | 0 |
| DetectBench: Can Large Language Model Detect and Piece Together Implicit Evidence? | Jun 18, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| SecQA: A Concise Question-Answering Dataset for Evaluating Large Language Models in Computer Security | Dec 26, 2023 | Computer SecurityMultiple-choice | CodeCode Available | 0 |
| What Ingredients Make for an Effective Crowdsourcing Protocol for Difficult NLU Data Collection Tasks? | Jun 1, 2021 | Multiple-choiceNatural Language Understanding | CodeCode Available | 0 |
| Assessing the Quality of Multiple-Choice Questions Using GPT-4 and Rule-Based Methods | Jul 16, 2023 | Multiple-choice | CodeCode Available | 0 |
| TAXI: Evaluating Categorical Knowledge Editing for Language Models | Apr 23, 2024 | knowledge editingMultiple-choice | CodeCode Available | 0 |
| WiCkeD: A Simple Method to Make Multiple Choice Benchmarks More Challenging | Feb 25, 2025 | MMLUMultiple-choice | CodeCode Available | 0 |
| What Makes Reading Comprehension Questions Easier? | Aug 28, 2018 | Machine Reading ComprehensionMultiple-choice | CodeCode Available | 0 |
| Downstream Trade-offs of a Family of Text Watermarks | Nov 16, 2023 | FormLanguage Modelling | CodeCode Available | 0 |
| Teach2Eval: An Indirect Evaluation Method for LLM by Judging How It Teaches | May 18, 2025 | FairnessMemorization | CodeCode Available | 0 |
| A multimodal dataset for understanding the impact of mobile phones on remote online virtual education | Dec 13, 2024 | EEGHead Pose Estimation | CodeCode Available | 0 |
| Utilize the Flow before Stepping into the Same River Twice: Certainty Represented Knowledge Flow for Refusal-Aware Instruction Tuning | Oct 9, 2024 | HallucinationMultiple-choice | CodeCode Available | 0 |
| Can Model Uncertainty Function as a Proxy for Multiple-Choice Question Item Difficulty? | Jul 7, 2024 | Multiple-choice | CodeCode Available | 0 |
| Differentiating Choices via Commonality for Multiple-Choice Question Answering | Aug 21, 2024 | Multiple-choiceMultiple Choice Question Answering (MCQA) | CodeCode Available | 0 |
| Utilizing Background Knowledge for Robust Reasoning over Traffic Situations | Dec 4, 2022 | Knowledge GraphsMultiple-choice | CodeCode Available | 0 |
| Difficult Task Yes but Simple Task No: Unveiling the Laziness in Multimodal LLMs | Oct 15, 2024 | Image DescriptionMultiple-choice | CodeCode Available | 0 |
| Improving Machine Reading Comprehension with General Reading Strategies | Oct 31, 2018 | ARCLanguage Modeling | CodeCode Available | 0 |
| A large language model-assisted education tool to provide feedback on open-ended responses | Jul 25, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| DisGeM: Distractor Generation for Multiple Choice Questions with Span Masking | Sep 26, 2024 | Distractor GenerationMultiple-choice | CodeCode Available | 0 |
| Analogical Reasoning Inside Large Language Models: Concept Vectors and the Limits of Abstraction | Mar 5, 2025 | In-Context LearningMultiple-choice | CodeCode Available | 0 |
| Improving Question Answering with External Knowledge | Feb 3, 2019 | ARCMultiple-choice | CodeCode Available | 0 |
| Distractor Generation for Multiple Choice Questions Using Learning to Rank | Jun 1, 2018 | BIG-bench Machine LearningDistractor Generation | CodeCode Available | 0 |
| Distractor generation for multiple-choice questions with predictive prompting and large language models | Jul 30, 2023 | Distractor GenerationMultiple-choice | CodeCode Available | 0 |
| A Study on Large Language Models' Limitations in Multiple-Choice Question Answering | Jan 15, 2024 | Multiple-choiceQuestion Answering | CodeCode Available | 0 |
| MIRTT: Learning Multimodal Interaction Representations from Trilinear Transformers for Visual Question Answering | Nov 1, 2021 | multimodal interactionMultiple-choice | CodeCode Available | 0 |
| INCEPTNET: Precise And Early Disease Detection Application For Medical Images Analyses | Sep 5, 2023 | Cell DetectionLesion Segmentation | CodeCode Available | 0 |
| DiVERT: Distractor Generation with Variational Errors Represented as Text for Math Multiple-choice Questions | Jun 27, 2024 | Distractor GenerationMath | CodeCode Available | 0 |
| DLP-LoRA: Efficient Task-Specific LoRA Fusion with a Dynamic, Lightweight Plugin for Large Language Models | Oct 2, 2024 | Multiple-choiceparameter-efficient fine-tuning | CodeCode Available | 0 |
| DMCL: Distillation Multiple Choice Learning for Multimodal Action Recognition | Dec 23, 2019 | Action RecognitionMultiple-choice | CodeCode Available | 0 |
| Plausibly Problematic Questions in Multiple-Choice Benchmarks for Commonsense Reasoning | Oct 6, 2024 | Multiple-choice | CodeCode Available | 0 |
| Affordably Fine-tuned LLMs Provide Better Answers to Course-specific MCQs | Jan 10, 2025 | Multiple-choice | CodeCode Available | 0 |
| Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language Models | May 30, 2025 | MathMultiple-choice | CodeCode Available | 0 |
| Does Multiple Choice Have a Future in the Age of Generative AI? A Posttest-only RCT | Dec 13, 2024 | Multiple-choice | CodeCode Available | 0 |
| VarBench: Robust Language Model Benchmarking Through Dynamic Variable Perturbation | Jun 25, 2024 | ARCBenchmarking | CodeCode Available | 0 |
| POE: Process of Elimination for Multiple Choice Reasoning | Oct 24, 2023 | In-Context LearningLogical Reasoning | CodeCode Available | 0 |
| When Retriever-Reader Meets Scenario-Based Multiple-Choice Questions | Aug 31, 2021 | Multiple-choiceQuestion Answering | CodeCode Available | 0 |
| UBENCH: Benchmarking Uncertainty in Large Language Models with Multiple Choice Questions | Jun 18, 2024 | BenchmarkingMultiple-choice | CodeCode Available | 0 |
| MLaKE: Multilingual Knowledge Editing Benchmark for Large Language Models | Apr 7, 2024 | Benchmarkingknowledge editing | CodeCode Available | 0 |
| A Joint Sequence Fusion Model for Video Question Answering and Retrieval | Aug 7, 2018 | DecoderMultiple-choice | CodeCode Available | 0 |
| Vision-Language and Large Language Model Performance in Gastroenterology: GPT, Claude, Llama, Phi, Mistral, Gemma, and Quantized Models | Aug 25, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| ChatGPT for GTFS: Benchmarking LLMs on GTFS Understanding and Retrieval | Aug 4, 2023 | BenchmarkingInformation Retrieval | CodeCode Available | 0 |
| A Profit-Maximizing Strategy for Advertising on the e-Commerce Platforms | Oct 31, 2022 | ManagementMultiple-choice | CodeCode Available | 0 |
| DREAM: A Challenge Dataset and Models for Dialogue-Based Reading Comprehension | Feb 1, 2019 | Dialogue UnderstandingMultiple-choice | CodeCode Available | 0 |
| Introducing a framework to assess newly created questions with Natural Language Processing | Apr 28, 2020 | Multiple-choice | CodeCode Available | 0 |
| Introducing Flexible Monotone Multiple Choice Item Response Theory Models and Bit Scales | Oct 2, 2024 | Multiple-choice | CodeCode Available | 0 |