SOTAVerified

Multiple-choice

Papers

Showing 651675 of 1107 papers

TitleStatusHype
Anchored Answers: Unravelling Positional Bias in GPT-2's Multiple-Choice QuestionsCode0
WorldQA: Multimodal World Knowledge in Videos through Long-Chain Reasoning0
Math Multiple Choice Question Generation via Human-Large Language Model Collaboration0
FoundaBench: Evaluating Chinese Fundamental Knowledge Capabilities of Large Language Models0
From Multiple-Choice to Extractive QA: A Case Study for English and ArabicCode0
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites0
TAXI: Evaluating Categorical Knowledge Editing for Language ModelsCode0
AI and Machine Learning for Next Generation Science Assessments0
UnibucLLM: Harnessing LLMs for Automated Prediction of Item Difficulty and Response Time for Multiple-Choice QuestionsCode0
Improving Automated Distractor Generation for Math Multiple-choice Questions with Overgenerate-and-rank0
Is There No Such Thing as a Bad Question? H4R: HalluciBot For Ratiocination, Rewriting, Ranking, and Routing0
BLINK: Multimodal Large Language Models Can See but Not Perceive0
ViLLM-Eval: A Comprehensive Evaluation Suite for Vietnamese Large Language Models0
Question Difficulty Ranking for Multiple-Choice Reading Comprehension0
Look at the Text: Instruction-Tuned Language Models are More Robust Multiple Choice Selectors than You ThinkCode0
Automatic Generation and Evaluation of Reading Comprehension Test Items with Large Language ModelsCode0
MoReVQA: Exploring Modular Reasoning Models for Video Question Answering0
MLaKE: Multilingual Knowledge Editing Benchmark for Large Language ModelsCode0
Cleared for Takeoff? Compositional & Conditional Reasoning may be the Achilles Heel to (Flight-Booking) Language Agents0
NLP at UC Santa Cruz at SemEval-2024 Task 5: Legal Answer Validation using Few-Shot Multi-Choice QACode0
CSEPrompts: A Benchmark of Introductory Computer Science PromptsCode0
Exploring Automated Distractor Generation for Math Multiple-choice Questions via Large Language ModelsCode0
AILS-NTUA at SemEval-2024 Task 9: Cracking Brain Teasers: Transformer Models for Lateral Thinking PuzzlesCode0
Can multiple-choice questions really be useful in detecting the abilities of LLMs?Code0
Pragmatic Competence Evaluation of Large Language Models for the Korean LanguageCode0
Show:102550
← PrevPage 27 of 45Next →

No leaderboard results yet.