SOTAVerified

Multiple-choice

Papers

Showing 501525 of 1107 papers

TitleStatusHype
Do Large Language Models Understand Conversational Implicature -- A case study with a chinese sitcomCode1
FoundaBench: Evaluating Chinese Fundamental Knowledge Capabilities of Large Language Models0
PLAYER*: Enhancing LLM-based Multi-Agent Communication and Interaction in Murder Mystery GamesCode2
From Multiple-Choice to Extractive QA: A Case Study for English and ArabicCode0
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites0
SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual ComprehensionCode3
AI and Machine Learning for Next Generation Science Assessments0
TAXI: Evaluating Categorical Knowledge Editing for Language ModelsCode0
UnibucLLM: Harnessing LLMs for Automated Prediction of Item Difficulty and Response Time for Multiple-Choice QuestionsCode0
Improving Automated Distractor Generation for Math Multiple-choice Questions with Overgenerate-and-rank0
Is There No Such Thing as a Bad Question? H4R: HalluciBot For Ratiocination, Rewriting, Ranking, and Routing0
BLINK: Multimodal Large Language Models Can See but Not Perceive0
ViLLM-Eval: A Comprehensive Evaluation Suite for Vietnamese Large Language Models0
Question Difficulty Ranking for Multiple-Choice Reading Comprehension0
Look at the Text: Instruction-Tuned Language Models are More Robust Multiple Choice Selectors than You ThinkCode0
Automatic Generation and Evaluation of Reading Comprehension Test Items with Large Language ModelsCode0
MoReVQA: Exploring Modular Reasoning Models for Video Question Answering0
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video UnderstandingCode3
MLaKE: Multilingual Knowledge Editing Benchmark for Large Language ModelsCode0
Cleared for Takeoff? Compositional & Conditional Reasoning may be the Achilles Heel to (Flight-Booking) Language Agents0
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual TokensCode4
NLP at UC Santa Cruz at SemEval-2024 Task 5: Legal Answer Validation using Few-Shot Multi-Choice QACode0
CSEPrompts: A Benchmark of Introductory Computer Science PromptsCode0
Exploring Automated Distractor Generation for Math Multiple-choice Questions via Large Language ModelsCode0
AILS-NTUA at SemEval-2024 Task 9: Cracking Brain Teasers: Transformer Models for Lateral Thinking PuzzlesCode0
Show:102550
← PrevPage 21 of 45Next →

No leaderboard results yet.