SOTAVerified

Multiple-choice

Papers

Showing 151200 of 1107 papers

TitleStatusHype
Leveraging Large Language Models for Learning Complex Legal Concepts through StorytellingCode1
AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information?Code1
Constructing Narrative Event Evolutionary Graph for Script Event PredictionCode1
CHOICE: Benchmarking the Remote Sensing Capabilities of Large Vision-Language ModelsCode1
Leveraging Large Language Models for Multiple Choice Question AnsweringCode1
Benchmarking AI scientists in omics data-driven biological researchCode1
An MRC Framework for Semantic Role LabelingCode1
Benchmarking Large Language Models on Answering and Explaining Challenging Medical QuestionsCode1
LifeQA: A Real-life Dataset for Video Question AnsweringCode1
Complex Reasoning over Logical Queries on Commonsense Knowledge GraphsCode1
Leaf: Multiple-Choice Question GenerationCode1
CommonsenseQA: A Question Answering Challenge Targeting Commonsense KnowledgeCode1
A Fine-tuning Dataset and Benchmark for Large Language Models for Protein UnderstandingCode1
LogicVista: Multimodal LLM Logical Reasoning Benchmark in Visual ContextsCode1
An Open Source Data Contamination Report for Large Language ModelsCode1
Delving into the Reversal Curse: How Far Can Large Language Models Generalize?Code1
Latxa: An Open Language Model and Evaluation Suite for BasqueCode1
Conformal Prediction with Large Language Models for Multi-Choice Question AnsweringCode1
Marathon: A Race Through the Realm of Long Context with Large Language ModelsCode1
CodeApex: A Bilingual Programming Evaluation Benchmark for Large Language ModelsCode1
Daily-Omni: Towards Audio-Visual Reasoning with Temporal Alignment across ModalitiesCode1
Is Bigger and Deeper Always Better? Probing LLaMA Across Scales and LayersCode1
MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific ResearchCode1
Do Large Language Models Understand Conversational Implicature -- A case study with a chinese sitcomCode1
CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and ReasoningCode1
CoLoR-Filter: Conditional Loss Reduction Filtering for Targeted Language Model Pre-trainingCode1
BiMediX: Bilingual Medical Mixture of Experts LLMCode1
E-EVAL: A Comprehensive Chinese K-12 Education Evaluation Benchmark for Large Language ModelsCode1
Large Language Models Encode Clinical KnowledgeCode1
Let Androids Dream of Electric Sheep: A Human-like Image Implication Understanding and Reasoning FrameworkCode1
EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language UnderstandingCode1
BLEnD: A Benchmark for LLMs on Everyday Knowledge in Diverse Cultures and LanguagesCode1
LLMEval-Med: A Real-world Clinical Benchmark for Medical LLMs with Physician ValidationCode1
JMedLoRA:Medical Domain Adaptation on Japanese Large Language Models using Instruction-tuningCode1
A Few More Examples May Be Worth Billions of ParametersCode1
MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language ModelsCode1
Boosting Healthcare LLMs Through Retrieved ContextCode1
BRAINTEASER: Lateral Thinking Puzzles for Large Language ModelsCode1
Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model EvaluationCode1
Bridging Video-text Retrieval with Multiple Choice QuestionsCode1
Knowledge Graph-Augmented Abstractive Summarization with Semantic-Driven Cloze RewardCode1
Clues Before Answers: Generation-Enhanced Multiple-Choice QACode1
HCQA @ Ego4D EgoSchema Challenge 2024Code1
Multiple Choice Questions based Multi-Interest Policy Learning for Conversational RecommendationCode1
IntentionQA: A Benchmark for Evaluating Purchase Intention Comprehension Abilities of Language Models in E-commerceCode1
NarrativeXL: A Large-scale Dataset For Long-Term Memory ModelsCode1
Explaining NLP Models via Minimal Contrastive Editing (MiCE)Code1
Explicit Planning Helps Language Models in Logical ReasoningCode1
A BERT-based Distractor Generation Scheme with Multi-tasking and Negative Answer Training Strategies.Code1
AutoLogi: Automated Generation of Logic Puzzles for Evaluating Reasoning Abilities of Large Language ModelsCode1
Show:102550
← PrevPage 4 of 23Next →

No leaderboard results yet.