SOTAVerified

Multiple-choice

Papers

Showing 10011050 of 1107 papers

TitleStatusHype
Answer-level Calibration for Free-form Multiple Choice Question AnsweringCode0
Sentence Embeddings for Russian NLUCode0
Language Models as Knowledge Bases for Visual Word Sense DisambiguationCode0
Multimodal Residual Learning for Visual QACode0
QASC: A Dataset for Question Answering via Sentence CompositionCode0
VQS: Linking Segmentations to Questions and Answers for Supervised Attention in VQA and Question-Focused Semantic SegmentationCode0
Simulating Training Data Leakage in Multiple-Choice Benchmarks for LLM EvaluationCode0
Every Answer Matters: Evaluating Commonsense with Probabilistic MeasuresCode0
Evidence Sentence Extraction for Machine Reading ComprehensionCode0
BertaQA: How Much Do Language Models Know About Local Culture?Code0
EXAMS-V: A Multi-Discipline Multilingual Multimodal Exam Benchmark for Evaluating Vision Language ModelsCode0
SNS-Bench-VL: Benchmarking Multimodal Large Language Models in Social Networking ServicesCode0
BERT-based distractor generation for Swedish reading comprehension questions using a small-scale datasetCode0
Quantitative Assessment of Intersectional Empathetic Bias and UnderstandingCode0
Explanatory Argument Extraction of Correct Answers in Resident Medical ExamsCode0
Multiple Choice Questions and Large Languages Models: A Case Study with Fictional Medical DataCode0
Cascading Biases: Investigating the Effect of Heuristic Annotation Strategies on Data and ModelsCode0
Automated Distractor and Feedback Generation for Math Multiple-choice Questions via In-context LearningCode0
Exploring Automated Distractor Generation for Math Multiple-choice Questions via Large Language ModelsCode0
Exploring Iterative Enhancement for Improving Learnersourced Multiple-Choice Question Explanations with Large Language ModelsCode0
Question Answering as Global Reasoning over Semantic AbstractionsCode0
KnowledgePrompts: Exploring the Abilities of Large Language Models to Solve Proportional Analogies via Knowledge-Enhanced PromptingCode0
Multiple Hypothesis Dropout: Estimating the Parameters of Multi-Modal Output DistributionsCode0
Question-Aware Knowledge Graph Prompting for Enhancing Large Language ModelsCode0
An Automatic Question Usability Evaluation ToolkitCode0
SocialIQA: Commonsense Reasoning about Social InteractionsCode0
Questioning the Survey Responses of Large Language ModelsCode0
Exposing the Limits of Video-Text Models through Contrast SetsCode0
Extracting Keywords from Open-Ended Business Survey QuestionsCode0
Question-Instructed Visual Descriptions for Zero-Shot Video Question AnsweringCode0
Learning to Attend On Essential Terms: An Enhanced Retriever-Reader Model for Open-domain Question AnsweringCode0
Learning to Correction: Explainable Feedback Generation for Visual Commonsense Reasoning DistractorCode0
Learning to Reuse Distractors to support Multiple Choice Question Generation in EducationCode0
"My Answer is C": First-Token Probabilities Do Not Match Text Answers in Instruction-Tuned Language ModelsCode0
Probabilities of Chat LLMs Are Miscalibrated but Still Predict Correctness on Multiple-Choice Q&ACode0
FAT ALBERT: Finding Answers in Large Texts using Semantic Similarity Attention Layer based on BERTCode0
Leaving the barn door open for Clever Hans: Simple features predict LLM benchmark answersCode0
TRACE: Transformer-based Risk Assessment for Clinical EvaluationCode0
LEAVS: An LLM-based Labeler for Abdominal CT SupervisionCode0
Wrong Answers Can Also Be Useful: PlausibleQA -- A Large-Scale QA Dataset with Answer Plausibility ScoresCode0
Length Optimization in Conformal PredictionCode0
FIBER: Fill-in-the-Blanks as a Challenging Video Understanding Evaluation FrameworkCode0
Training-free LLM Merging for Multi-task LearningCode0
Solving and Generating NPR Sunday Puzzles with Large Language ModelsCode0
HSI: Head-Specific Intervention Can Induce Misaligned AI Coordination in Large Language ModelsCode0
Towards Efficient Methods in Medical Question Answering using Knowledge Graph EmbeddingsCode0
UnibucLLM: Harnessing LLMs for Automated Prediction of Item Difficulty and Response Time for Multiple-Choice QuestionsCode0
Solving Machine Learning ProblemsCode0
Leveraging large language models for nano synthesis mechanism explanation: solid foundations or mere conjectures?Code0
Fleurs-SLU: A Massively Multilingual Benchmark for Spoken Language UnderstandingCode0
Show:102550
← PrevPage 21 of 23Next →

No leaderboard results yet.