SOTAVerified

Multiple-choice

Papers

Showing 251300 of 1107 papers

TitleStatusHype
ExplaGraphs: An Explanation Graph Generation Task for Structured Commonsense ReasoningCode1
MoZIP: A Multilingual Benchmark to Evaluate Large Language Models in Intellectual PropertyCode1
NarrativeXL: A Large-scale Dataset For Long-Term Memory ModelsCode1
MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language ModelsCode1
Option Tracing: Beyond Binary Knowledge TracingCode1
The Effect of Sampling Temperature on Problem Solving in Large Language ModelsCode1
Unsupervised Commonsense Question Answering with Self-TalkCode1
A Study on Large Language Models' Limitations in Multiple-Choice Question AnsweringCode0
MLaKE: Multilingual Knowledge Editing Benchmark for Large Language ModelsCode0
Analogical Reasoning Inside Large Language Models: Concept Vectors and the Limits of AbstractionCode0
Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language ModelsCode0
Confident Multiple Choice LearningCode0
Assessing the Quality of Multiple-Choice Questions Using GPT-4 and Rule-Based MethodsCode0
MMM: Multi-stage Multi-task Learning for Multi-choice Reading ComprehensionCode0
COLUMBUS: Evaluating COgnitive Lateral Understanding through Multiple-choice reBUSesCode0
MIRTT: Learning Multimodal Interaction Representations from Trilinear Transformers for Visual Question AnsweringCode0
MM-PoE: Multiple Choice Reasoning via. Process of Elimination using Multi-Modal ModelsCode0
MedArabiQ: Benchmarking Large Language Models on Arabic Medical TasksCode0
A Simple Method for Commonsense ReasoningCode0
MedG-KRP: Medical Graph Knowledge Representation ProbingCode0
MCQG-SRefine: Multiple Choice Question Generation and Evaluation with Iterative Self-Critique, Correction, and Comparison FeedbackCode0
A Benchmark for Long-Form Medical Question AnsweringCode0
Measuring Agreeableness Bias in Multimodal ModelsCode0
MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation ModelsCode0
CNN for Text-Based Multiple Choice Question AnsweringCode0
A Multiple Choices Reading Comprehension Corpus for Vietnamese Language EducationCode0
Artifacts or Abduction: How Do LLMs Answer Multiple-Choice Questions Without the Question?Code0
Look at the Text: Instruction-Tuned Language Models are More Robust Multiple Choice Selectors than You ThinkCode0
CLOMO: Counterfactual Logical Modification with Large Language ModelsCode0
ARR: Question Answering with Large Language Models via Analyzing, Retrieving, and ReasoningCode0
LongHalQA: Long-Context Hallucination Evaluation for MultiModal Large Language ModelsCode0
LLaVA-OneVision: Easy Visual Task TransferCode0
Limited Ability of LLMs to Simulate Human Psychological Behaviours: a Psychometric AnalysisCode0
LiveQA: A Question Answering Dataset over Sports LiveCode0
LLMs Are Not Intelligent Thinkers: Introducing Mathematical Topic Tree Benchmark for Comprehensive Evaluation of LLMsCode0
ChatGPT for GTFS: Benchmarking LLMs on GTFS Understanding and RetrievalCode0
Are Vision LLMs Road-Ready? A Comprehensive Benchmark for Safety-Critical Driving Video UnderstandingCode0
Leveraging large language models for nano synthesis mechanism explanation: solid foundations or mere conjectures?Code0
Chain-of-Exemplar: Enhancing Distractor Generation for Multimodal Educational Question GenerationCode0
Are Large Language Models Consistent over Value-laden Questions?Code0
Towards Efficient Methods in Medical Question Answering using Knowledge Graph EmbeddingsCode0
HSI: Head-Specific Intervention Can Induce Misaligned AI Coordination in Large Language ModelsCode0
LEAVS: An LLM-based Labeler for Abdominal CT SupervisionCode0
Length Optimization in Conformal PredictionCode0
CASE: Commonsense-Augmented Score with an Expanded Answer SpaceCode0
Cascading Biases: Investigating the Effect of Heuristic Annotation Strategies on Data and ModelsCode0
Abductive Commonsense ReasoningCode0
A large language model-assisted education tool to provide feedback on open-ended responsesCode0
Can We Guide a Multi-Hop Reasoning Language Model to Incrementally Learn at Each Single-Hop?Code0
Learning to Correction: Explainable Feedback Generation for Visual Commonsense Reasoning DistractorCode0
Show:102550
← PrevPage 6 of 23Next →

No leaderboard results yet.