SOTAVerified

Multiple-choice

Papers

Showing 301350 of 1107 papers

TitleStatusHype
Can We Guide a Multi-Hop Reasoning Language Model to Incrementally Learn at Each Single-Hop?Code0
Can multiple-choice questions really be useful in detecting the abilities of LLMs?Code0
Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language ModelsCode0
MLaKE: Multilingual Knowledge Editing Benchmark for Large Language ModelsCode0
Can Model Uncertainty Function as a Proxy for Multiple-Choice Question Item Difficulty?Code0
A quantitative study of NLP approaches to question difficulty estimationCode0
Can Large Language Models Provide Security & Privacy Advice? Measuring the Ability of LLMs to Refute MisconceptionsCode0
A Joint Sequence Fusion Model for Video Question Answering and RetrievalCode0
MIRTT: Learning Multimodal Interaction Representations from Trilinear Transformers for Visual Question AnsweringCode0
From Multiple-Choice to Extractive QA: A Case Study for English and ArabicCode0
AILS-NTUA at SemEval-2024 Task 9: Cracking Brain Teasers: Transformer Models for Lateral Thinking PuzzlesCode0
Sentence Embeddings for Russian NLUCode0
BUCA: A Binary Classification Approach to Unsupervised Commonsense Question AnsweringCode0
PROST: Physical Reasoning of Objects through Space and TimeCode0
Answer-level Calibration for Free-form Multiple Choice Question AnsweringCode0
MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation ModelsCode0
MCQG-SRefine: Multiple Choice Question Generation and Evaluation with Iterative Self-Critique, Correction, and Comparison FeedbackCode0
BnMMLU: Measuring Massive Multitask Language Understanding in BengaliCode0
Look at the Text: Instruction-Tuned Language Models are More Robust Multiple Choice Selectors than You ThinkCode0
Measuring Agreeableness Bias in Multimodal ModelsCode0
LongHalQA: Long-Context Hallucination Evaluation for MultiModal Large Language ModelsCode0
Biomedical Entity Linking as Multiple Choice Question AnsweringCode0
LLMs Are Not Intelligent Thinkers: Introducing Mathematical Topic Tree Benchmark for Comprehensive Evaluation of LLMsCode0
MedArabiQ: Benchmarking Large Language Models on Arabic Medical TasksCode0
Limited Ability of LLMs to Simulate Human Psychological Behaviours: a Psychometric AnalysisCode0
Leveraging large language models for nano synthesis mechanism explanation: solid foundations or mere conjectures?Code0
LiveQA: A Question Answering Dataset over Sports LiveCode0
Eliciting Informative Text Evaluations with Large Language ModelsCode0
Towards Efficient Methods in Medical Question Answering using Knowledge Graph EmbeddingsCode0
HSI: Head-Specific Intervention Can Induce Misaligned AI Coordination in Large Language ModelsCode0
LLaVA-OneVision: Easy Visual Task TransferCode0
LEAVS: An LLM-based Labeler for Abdominal CT SupervisionCode0
Edu-Values: Towards Evaluating the Chinese Education Values of Large Language ModelsCode0
A Novel Multi-Stage Prompting Approach for Language Agnostic MCQ Generation using GPTCode0
Length Optimization in Conformal PredictionCode0
Learning to Correction: Explainable Feedback Generation for Visual Commonsense Reasoning DistractorCode0
Learning to Reuse Distractors to support Multiple Choice Question Generation in EducationCode0
Learning to Attend On Essential Terms: An Enhanced Retriever-Reader Model for Open-domain Question AnsweringCode0
Beyond English-Only Reading Comprehension: Experiments in Zero-Shot Multilingual Transfer for BulgarianCode0
EAIRA: Establishing a Methodology for Evaluating AI Models as Scientific Research AssistantsCode0
Leaving the barn door open for Clever Hans: Simple features predict LLM benchmark answersCode0
DyePack: Provably Flagging Test Set Contamination in LLMs Using BackdoorsCode0
BERT-based distractor generation for Swedish reading comprehension questions using a small-scale datasetCode0
SCoRE: Benchmarking Long-Chain Reasoning in Commonsense ScenariosCode0
DREAM: A Challenge Dataset and Models for Dialogue-Based Reading ComprehensionCode0
ElimiNet: A Model for Eliminating Options for Reading Comprehension with Multiple Choice QuestionsCode0
BertaQA: How Much Do Language Models Know About Local Culture?Code0
EMBRACE: Evaluation and Modifications for Boosting RACECode0
Language Models as Knowledge Bases for Visual Word Sense DisambiguationCode0
It's Not Easy Being Wrong: Large Language Models Struggle with Process of Elimination ReasoningCode0
Show:102550
← PrevPage 7 of 23Next →

No leaderboard results yet.