SOTAVerified

Multiple-choice

Papers

Showing 401450 of 1107 papers

TitleStatusHype
Can Large Language Models Provide Security & Privacy Advice? Measuring the Ability of LLMs to Refute MisconceptionsCode0
KnowledgePrompts: Exploring the Abilities of Large Language Models to Solve Proportional Analogies via Knowledge-Enhanced PromptingCode0
LLaVA-OneVision: Easy Visual Task TransferCode0
Leveraging large language models for nano synthesis mechanism explanation: solid foundations or mere conjectures?Code0
DisGeM: Distractor Generation for Multiple Choice Questions with Span MaskingCode0
Can Model Uncertainty Function as a Proxy for Multiple-Choice Question Item Difficulty?Code0
Towards Efficient Methods in Medical Question Answering using Knowledge Graph EmbeddingsCode0
LongHalQA: Long-Context Hallucination Evaluation for MultiModal Large Language ModelsCode0
Automating Turkish Educational Quiz Generation Using Large Language ModelsCode0
Difficult Task Yes but Simple Task No: Unveiling the Laziness in Multimodal LLMsCode0
LEAVS: An LLM-based Labeler for Abdominal CT SupervisionCode0
Differentiating Choices via Commonality for Multiple-Choice Question AnsweringCode0
A large language model-assisted education tool to provide feedback on open-ended responsesCode0
Leaving the barn door open for Clever Hans: Simple features predict LLM benchmark answersCode0
Length Optimization in Conformal PredictionCode0
Learning to Correction: Explainable Feedback Generation for Visual Commonsense Reasoning DistractorCode0
CASE: Commonsense-Augmented Score with an Expanded Answer SpaceCode0
Learning to Attend On Essential Terms: An Enhanced Retriever-Reader Model for Open-domain Question AnsweringCode0
Learning to Reuse Distractors to support Multiple Choice Question Generation in EducationCode0
Language Models as Knowledge Bases for Visual Word Sense DisambiguationCode0
SCoRE: Benchmarking Long-Chain Reasoning in Commonsense ScenariosCode0
KGQuiz: Evaluating the Generalization of Encoded Knowledge in Large Language ModelsCode0
Affordably Fine-tuned LLMs Provide Better Answers to Course-specific MCQsCode0
Automatic Generation and Evaluation of Reading Comprehension Test Items with Large Language ModelsCode0
Joint Learning of Sentence Embeddings for Relevance and EntailmentCode0
DetectBench: Can Large Language Model Detect and Piece Together Implicit Evidence?Code0
Are Large Language Models Consistent over Value-laden Questions?Code0
Probabilities of Chat LLMs Are Miscalibrated but Still Predict Correctness on Multiple-Choice Q&ACode0
It's Not Easy Being Wrong: Large Language Models Struggle with Process of Elimination ReasoningCode0
Fleurs-SLU: A Massively Multilingual Benchmark for Spoken Language UnderstandingCode0
Kaleidoscope: In-language Exams for Massively Multilingual Vision EvaluationCode0
HSI: Head-Specific Intervention Can Induce Misaligned AI Coordination in Large Language ModelsCode0
IPEval: A Bilingual Intellectual Property Agency Consultation Evaluation Benchmark for Large Language ModelsCode0
Investigating the Shortcomings of LLMs in Step-by-Step Legal ReasoningCode0
iREL at SemEval-2024 Task 9: Improving Conventional Prompting Methods for Brain TeasersCode0
DefAn: Definitive Answer Dataset for LLMs Hallucination EvaluationCode0
Anchored Answers: Unravelling Positional Bias in GPT-2's Multiple-Choice QuestionsCode0
StoryAnalogy: Deriving Story-level Analogies from Large Language Models to Unlock Analogical UnderstandingCode0
Introducing a framework to assess newly created questions with Natural Language ProcessingCode0
DE-COP: Detecting Copyrighted Content in Language Models Training DataCode0
An Automatic Question Usability Evaluation ToolkitCode0
Introducing Flexible Monotone Multiple Choice Item Response Theory Models and Bit ScalesCode0
Is Your Large Language Model Knowledgeable or a Choices-Only Cheater?Code0
A Profit-Maximizing Strategy for Advertising on the e-Commerce PlatformsCode0
Fusing Models with Complementary ExpertiseCode0
TAXI: Evaluating Categorical Knowledge Editing for Language ModelsCode0
Automated Generation and Tagging of Knowledge Components from Multiple-Choice QuestionsCode0
Chance-Constrained Multiple-Choice Knapsack Problem: Model, Algorithms, and ApplicationsCode0
Improving Question Answering with External KnowledgeCode0
DAHL: Domain-specific Automated Hallucination Evaluation of Long-Form Text through a Benchmark Dataset in BiomedicineCode0
Show:102550
← PrevPage 9 of 23Next →

No leaderboard results yet.