SOTAVerified

Multiple-choice

Papers

Showing 401425 of 1107 papers

TitleStatusHype
Can Large Language Models Provide Security & Privacy Advice? Measuring the Ability of LLMs to Refute MisconceptionsCode0
KnowledgePrompts: Exploring the Abilities of Large Language Models to Solve Proportional Analogies via Knowledge-Enhanced PromptingCode0
Difficult Task Yes but Simple Task No: Unveiling the Laziness in Multimodal LLMsCode0
A quantitative study of NLP approaches to question difficulty estimationCode0
SCoRE: Benchmarking Long-Chain Reasoning in Commonsense ScenariosCode0
Differentiating Choices via Commonality for Multiple-Choice Question AnsweringCode0
Language Models as Knowledge Bases for Visual Word Sense DisambiguationCode0
Exposing the Limits of Video-Text Models through Contrast SetsCode0
Extracting Keywords from Open-Ended Business Survey QuestionsCode0
Joint Learning of Sentence Embeddings for Relevance and EntailmentCode0
Can We Guide a Multi-Hop Reasoning Language Model to Incrementally Learn at Each Single-Hop?Code0
Kaleidoscope: In-language Exams for Massively Multilingual Vision EvaluationCode0
It's Not Easy Being Wrong: Large Language Models Struggle with Process of Elimination ReasoningCode0
KGQuiz: Evaluating the Generalization of Encoded Knowledge in Large Language ModelsCode0
iREL at SemEval-2024 Task 9: Improving Conventional Prompting Methods for Brain TeasersCode0
IPEval: A Bilingual Intellectual Property Agency Consultation Evaluation Benchmark for Large Language ModelsCode0
CASE: Commonsense-Augmented Score with an Expanded Answer SpaceCode0
Introducing Flexible Monotone Multiple Choice Item Response Theory Models and Bit ScalesCode0
Affordably Fine-tuned LLMs Provide Better Answers to Course-specific MCQsCode0
Investigating the Shortcomings of LLMs in Step-by-Step Legal ReasoningCode0
Is Your Large Language Model Knowledgeable or a Choices-Only Cheater?Code0
Automatic Generation and Evaluation of Reading Comprehension Test Items with Large Language ModelsCode0
DetectBench: Can Large Language Model Detect and Piece Together Implicit Evidence?Code0
Introducing a framework to assess newly created questions with Natural Language ProcessingCode0
Iterative Forward Tuning Boosts In-Context Learning in Language ModelsCode0
Show:102550
← PrevPage 17 of 45Next →

No leaderboard results yet.