SOTAVerified

Multiple-choice

Papers

Showing 701750 of 1107 papers

TitleStatusHype
Uncertainty quantification in fine-tuned LLMs using LoRA ensemblesCode0
KMMLU: Measuring Massive Multitask Language Understanding in Korean0
Question-Instructed Visual Descriptions for Zero-Shot Video Question AnsweringCode0
DE-COP: Detecting Copyrighted Content in Language Models Training DataCode0
Prompting Implicit Discourse Relation Annotation0
SceMQA: A Scientific College Entrance Level Multimodal Question Answering Benchmark0
Are Machines Better at Complex Reasoning? Unveiling Human-Machine Inference Gaps in Entailment Verification0
Enhancing textual textbook question answering with large language models and retrieval augmented generationCode0
LLMs May Perform MCQA by Selecting the Least Incorrect Option0
Distractor Generation in Multiple-Choice Tasks: A Survey of Methods, Datasets, and Evaluation0
When Benchmarks are Targets: Revealing the Sensitivity of Large Language Model LeaderboardsCode0
An Information-Theoretic Approach to Analyze NLP Classification TasksCode0
Evaluating LLM -- Generated Multimodal Diagnosis from Medical Images and Symptom Analysis0
Towards Collective Superintelligence: Amplifying Group IQ using Conversational Swarms0
Instruction Fine-Tuning: Does Prompt Loss Matter?0
What Large Language Models Know and What People Think They Know0
Towards Efficient Methods in Medical Question Answering using Knowledge Graph EmbeddingsCode0
A Study on Large Language Models' Limitations in Multiple-Choice Question AnsweringCode0
Assessing Large Language Models in Mechanical Engineering Education: A Study on Mechanics-Focused Conceptual Understanding0
Automated Answer Validation using Text Similarity0
A Novel Multi-Stage Prompting Approach for Language Agnostic MCQ Generation using GPTCode0
PUB: A Pragmatics Understanding Benchmark for Assessing LLMs' Pragmatics Capabilities0
A Joint-Reasoning based Disease Q&A System0
The Earth is Flat? Unveiling Factual Errors in Large Language Models0
FusionMind -- Improving question and answering with external context fusion0
SecQA: A Concise Question-Answering Dataset for Evaluating Large Language Models in Computer SecurityCode0
Towards a Unified Multimodal Reasoning FrameworkCode0
Perception Test 2023: A Summary of the First Challenge And Outcome0
BloomVQA: Assessing Hierarchical Multi-modal Comprehension0
Multiple Hypothesis Dropout: Estimating the Parameters of Multi-Modal Output DistributionsCode0
Self-Evaluation Improves Selective Generation in Large Language Models0
A Foundational Multimodal Vision Language AI Assistant for Human Pathology0
A Comparative Study of AI-Generated (GPT-4) and Human-crafted MCQs in Programming Education0
Unleashing the Potential of Large Language Model: Zero-shot VQA for Flood Disaster Scenario0
Explanatory Argument Extraction of Correct Answers in Resident Medical ExamsCode0
Evaluating the Rationale Understanding of Critical Reasoning in Logical Reading Comprehension0
CLOMO: Counterfactual Logical Modification with Large Language ModelsCode0
ConceptPsy:A Benchmark Suite with Conceptual Comprehensiveness in Psychology0
Investigating Data Contamination in Modern Benchmarks for Large Language Models0
Downstream Trade-offs of a Family of Text WatermarksCode0
Evaluating LLMs on Document-Based QA: Exact Answer Selection and Numerical Extraction using Cogtale dataset0
It's Not Easy Being Wrong: Large Language Models Struggle with Process of Elimination ReasoningCode0
Characterizing Large Language Models as Rationalizers of Knowledge-intensive Tasks0
Assessing Distractors in Multiple-Choice Tests0
Evaluating multiple large language models in pediatric ophthalmology0
Evaluating the Potential of Leading Large Language Models in Reasoning Biology Questions0
More Robots are Coming: Large Multimodal Models (ChatGPT) can Solve Visually Diverse Images of Parsons Problems0
CASE: Commonsense-Augmented Score with an Expanded Answer SpaceCode0
DeSIQ: Towards an Unbiased, Challenging Benchmark for Social Intelligence Understanding0
POE: Process of Elimination for Multiple Choice ReasoningCode0
Show:102550
← PrevPage 15 of 23Next →

No leaderboard results yet.