SOTAVerified

Multiple-choice

Papers

Showing 651700 of 1107 papers

TitleStatusHype
Predicting the Difficulty of Multiple Choice Questions in a High-stakes Medical Exam0
Predictions from language models for multiple-choice tasks are not robust under variation of scoring methods0
Probabilistic Consensus through Ensemble Validation: A Framework for LLM Reliability0
Prompt Engineering and Calibration for Zero-Shot Commonsense Reasoning0
Prompting Implicit Discourse Relation Annotation0
Instruction Fine-Tuning: Does Prompt Loss Matter?0
ProverbEval: Exploring LLM Evaluation Challenges for Low-resource Language Understanding0
ConceptPsy:A Benchmark Suite with Conceptual Comprehensiveness in Psychology0
PUB: A Pragmatics Understanding Benchmark for Assessing LLMs' Pragmatics Capabilities0
Q-Bench-Video: Benchmarking the Video Quality Understanding of LMMs0
Q-Bench-Video: Benchmark the Video Quality Understanding of LMMs0
QOG:Question and Options Generation based on Language Model0
QRMeM: Unleash the Length Limitation through Question then Reflection Memory Mechanism0
VisNumBench: Evaluating Number Sense of Multimodal Large Language Models0
Query Rewriting for Retrieval-Augmented Large Language Models0
Question Difficulty Ranking for Multiple-Choice Reading Comprehension0
Question-type Identification for Academic Questions in Online Learning Platform0
Visual7W: Grounded Question Answering in Images0
Ranking Facts for Explaining Answers to Elementary Science Questions0
Ranking Large Language Models without Ground Truth0
Read, Retrospect, Select: An MRC Framework to Short Text Entity Linking0
RECAP-KG: Mining Knowledge Graphs from Raw GP Notes for Remote COVID-19 Assessment in Primary Care0
Receptivity of an AI Cognitive Assistant by the Radiology Community: A Report on Data Collected at RSNA0
Recurrent and Contextual Models for Visual Question Answering0
Visual Madlibs: Fill in the Blank Description Generation and Question Answering0
Rethinking AI Cultural Alignment0
Rethinking Generative Large Language Model Evaluation for Semantic Comprehension0
Reusing Swedish FrameNet for training semantic roles0
Reversal Blessing: Thinking Backward May Outpace Thinking Forward in Multi-choice Questions0
RiddleSense: Reasoning about Riddle Questions Featuring Linguistic Creativity and Commonsense Knowledge0
RISCORE: Enhancing In-Context Riddle Solving in Language Models through Context-Reconstructed Example Augmentation0
R-LLaVA: Improving Med-VQA Understanding through Visual Region of Interest0
Robo2VLM: Visual Question Answering from Large-Scale In-the-Wild Robot Manipulation Datasets0
Robust portfolio optimization model for electronic coupon allocation0
Visual Madlibs: Fill in the blank Image Generation and Question Answering0
SafePath: Conformal Prediction for Safe LLM-Based Autonomous Navigation0
Adversarial Training for Machine Reading Comprehension with Virtual Embeddings0
SAGEval: The frontiers of Satisfactory Agent based NLG Evaluation for reference-free open-ended text0
Visual Question Answering as Reading Comprehension0
Adversarial Databases Improve Success in Retrieval-based Large Language Models0
SaL-Lightning Dataset: Search and Eye Gaze Behavior, Resource Interactions and Knowledge Gain during Web Search0
Sample then Identify: A General Framework for Risk Control and Assessment in Multimodal Large Language Models0
SARI: Structured Audio Reasoning via Curriculum-Guided Reinforcement Learning0
SaudiCulture: A Benchmark for Evaluating Large Language Models Cultural Competence within Saudi Arabia0
SB-Bench: Stereotype Bias Benchmark for Large Multimodal Models0
SceMQA: A Scientific College Entrance Level Multimodal Question Answering Benchmark0
Scene Restoring for Narrative Machine Reading Comprehension0
Scheduling Algorithms for Federated Learning with Minimal Energy Consumption0
VITAL: A New Dataset for Benchmarking Pluralistic Alignment in Healthcare0
GeoSQA: A Benchmark for Scenario-based Question Answering in the Geography Domain at High School Level0
Show:102550
← PrevPage 14 of 23Next →

No leaderboard results yet.