SOTAVerified

Multiple-choice

Papers

Showing 351400 of 1107 papers

TitleStatusHype
MM-PoE: Multiple Choice Reasoning via. Process of Elimination using Multi-Modal ModelsCode0
MedG-KRP: Medical Graph Knowledge Representation ProbingCode0
MedArabiQ: Benchmarking Large Language Models on Arabic Medical TasksCode0
EMBRACE: Evaluation and Modifications for Boosting RACECode0
MCQG-SRefine: Multiple Choice Question Generation and Evaluation with Iterative Self-Critique, Correction, and Comparison FeedbackCode0
Measuring Agreeableness Bias in Multimodal ModelsCode0
ElimiNet: A Model for Eliminating Options for Reading Comprehension with Multiple Choice QuestionsCode0
Eliciting Informative Text Evaluations with Large Language ModelsCode0
MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation ModelsCode0
Edu-Values: Towards Evaluating the Chinese Education Values of Large Language ModelsCode0
A Novel Multi-Stage Prompting Approach for Language Agnostic MCQ Generation using GPTCode0
LongHalQA: Long-Context Hallucination Evaluation for MultiModal Large Language ModelsCode0
Look at the Text: Instruction-Tuned Language Models are More Robust Multiple Choice Selectors than You ThinkCode0
Beyond English-Only Reading Comprehension: Experiments in Zero-Shot Multilingual Transfer for BulgarianCode0
EAIRA: Establishing a Methodology for Evaluating AI Models as Scientific Research AssistantsCode0
DyePack: Provably Flagging Test Set Contamination in LLMs Using BackdoorsCode0
LLaVA-OneVision: Easy Visual Task TransferCode0
BERT-based distractor generation for Swedish reading comprehension questions using a small-scale datasetCode0
Limited Ability of LLMs to Simulate Human Psychological Behaviours: a Psychometric AnalysisCode0
DREAM: A Challenge Dataset and Models for Dialogue-Based Reading ComprehensionCode0
BertaQA: How Much Do Language Models Know About Local Culture?Code0
LiveQA: A Question Answering Dataset over Sports LiveCode0
LLMs Are Not Intelligent Thinkers: Introducing Mathematical Topic Tree Benchmark for Comprehensive Evaluation of LLMsCode0
Towards Efficient Methods in Medical Question Answering using Knowledge Graph EmbeddingsCode0
HSI: Head-Specific Intervention Can Induce Misaligned AI Coordination in Large Language ModelsCode0
Leveraging large language models for nano synthesis mechanism explanation: solid foundations or mere conjectures?Code0
LEAVS: An LLM-based Labeler for Abdominal CT SupervisionCode0
Evaluating Prompts Across Multiple Choice Tasks In a Zero-Shot SettingCode0
Leaving the barn door open for Clever Hans: Simple features predict LLM benchmark answersCode0
Length Optimization in Conformal PredictionCode0
Neural Natural Logic Inference for Interpretable Question AnsweringCode0
Does Multiple Choice Have a Future in the Age of Generative AI? A Posttest-only RCTCode0
DMCL: Distillation Multiple Choice Learning for Multimodal Action RecognitionCode0
DLP-LoRA: Efficient Task-Specific LoRA Fusion with a Dynamic, Lightweight Plugin for Large Language ModelsCode0
DiVERT: Distractor Generation with Variational Errors Represented as Text for Math Multiple-choice QuestionsCode0
An Information-Theoretic Approach to Analyze NLP Classification TasksCode0
Learning to Attend On Essential Terms: An Enhanced Retriever-Reader Model for Open-domain Question AnsweringCode0
Every Answer Matters: Evaluating Commonsense with Probabilistic MeasuresCode0
KGQuiz: Evaluating the Generalization of Encoded Knowledge in Large Language ModelsCode0
Sentence Embeddings for Russian NLUCode0
Kaleidoscope: In-language Exams for Massively Multilingual Vision EvaluationCode0
Distractor generation for multiple-choice questions with predictive prompting and large language modelsCode0
Distractor Generation for Multiple Choice Questions Using Learning to RankCode0
SCoRE: Benchmarking Long-Chain Reasoning in Commonsense ScenariosCode0
It's Not Easy Being Wrong: Large Language Models Struggle with Process of Elimination ReasoningCode0
DisGeM: Distractor Generation for Multiple Choice Questions with Span MaskingCode0
Iterative Forward Tuning Boosts In-Context Learning in Language ModelsCode0
Joint Learning of Sentence Embeddings for Relevance and EntailmentCode0
Language Models as Knowledge Bases for Visual Word Sense DisambiguationCode0
Learning to Correction: Explainable Feedback Generation for Visual Commonsense Reasoning DistractorCode0
Show:102550
← PrevPage 8 of 23Next →

No leaderboard results yet.