SOTAVerified

Multiple-choice

Papers

Showing 901925 of 1107 papers

TitleStatusHype
Detect, Describe, Discriminate: Moving Beyond VQA for MLLM Evaluation0
Developing A Framework to Support Human Evaluation of Bias in Generated Free Response Text0
Development and Evaluation of a Personalized Computer-aided Question Generation for English Learners to Improve Proficiency and Correct Mistakes0
DFIR-Metric: A Benchmark Dataset for Evaluating Large Language Models in Digital Forensics and Incident Response0
D-GEN: Automatic Distractor Generation and Evaluation for Reliable Assessment of Generative Model0
DGRC: An Effective Fine-tuning Framework for Distractor Generation in Chinese Multi-choice Reading Comprehension0
Instructions and Guide for Diagnostic Questions: The NeurIPS 2020 Education Challenge0
Dialogue-Based Simulation For Cultural Awareness Training0
Dienstplanerstellung in Krankenhaeusern mittels genetischer Algorithmen0
Differentiable Open-Ended Commonsense Reasoning0
Plug-in, Trainable Gate for Streamlining Arbitrary Neural Networks0
Different Questions, Different Models: Fine-Grained Evaluation of Uncertainty and Calibration in Clinical QA with LLMs0
Digital Comprehensibility Assessment of Simplified Texts among Persons with Intellectual Disabilities0
Disaggregating Hops: Can We Guide a Multi-Hop Reasoning Language Model to Incrementally Learn at each Hop?0
DISTO: Evaluating Textual Distractors for Multi-Choice Questions using Negative Sampling based Approach0
Distractor Analysis and Selection for Multiple-Choice Cloze Questions for Second-Language Learners0
Distractor Generation in Multiple-Choice Tasks: A Survey of Methods, Datasets, and Evaluation0
Distributional semantics beyond words: Supervised learning of analogy and paraphrase0
DiverseNet: When One Right Answer is not Enough0
DMind Benchmark: Toward a Holistic Assessment of LLM Capabilities across the Web3 Domain0
Document-level Event Factuality Identification via Machine Reading Comprehension Frameworks with Transfer Learning0
Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla0
Do Fine-tuned Commonsense Language Models Really Generalize?0
Do Large Language Models Know Folktales? A Case Study of Yokai in Japanese Folktales0
Do LLMs Act as Repositories of Causal Knowledge?0
Show:102550
← PrevPage 37 of 45Next →

No leaderboard results yet.