Multiple-choice

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 726–750 of 1107 papers

Title	Date	Tasks	Status
SecQA: A Concise Question-Answering Dataset for Evaluating Large Language Models in Computer Security	Dec 26, 2023	Computer SecurityMultiple-choice	CodeCode Available
Towards a Unified Multimodal Reasoning Framework	Dec 22, 2023	Multimodal ReasoningMultiple-choice	CodeCode Available
Perception Test 2023: A Summary of the First Challenge And Outcome	Dec 20, 2023	BenchmarkingGrounded Video Question Answering	—Unverified
BloomVQA: Assessing Hierarchical Multi-modal Comprehension	Dec 20, 2023	Data AugmentationMemorization	—Unverified
Multiple Hypothesis Dropout: Estimating the Parameters of Multi-Modal Output Distributions	Dec 18, 2023	Multiple-choicePedestrian Trajectory Prediction	CodeCode Available
Self-Evaluation Improves Selective Generation in Large Language Models	Dec 14, 2023	Multiple-choiceTruthfulQA	—Unverified
A Foundational Multimodal Vision Language AI Assistant for Human Pathology	Dec 13, 2023	Decision MakingDiagnostic	—Unverified
A Comparative Study of AI-Generated (GPT-4) and Human-crafted MCQs in Programming Education	Dec 5, 2023	Multiple-choice	—Unverified
Unleashing the Potential of Large Language Model: Zero-shot VQA for Flood Disaster Scenario	Dec 4, 2023	Language ModelingLanguage Modelling	—Unverified
Explanatory Argument Extraction of Correct Answers in Resident Medical Exams	Dec 1, 2023	Multiple-choice	CodeCode Available
Evaluating the Rationale Understanding of Critical Reasoning in Logical Reading Comprehension	Nov 30, 2023	Multiple-choiceReading Comprehension	—Unverified
CLOMO: Counterfactual Logical Modification with Large Language Models	Nov 29, 2023	counterfactualCounterfactual Reasoning	CodeCode Available
ConceptPsy:A Benchmark Suite with Conceptual Comprehensiveness in Psychology	Nov 16, 2023	MMLUMultiple-choice	—Unverified
Investigating Data Contamination in Modern Benchmarks for Large Language Models	Nov 16, 2023	Common Sense ReasoningMMLU	—Unverified
Downstream Trade-offs of a Family of Text Watermarks	Nov 16, 2023	FormLanguage Modelling	CodeCode Available
Evaluating LLMs on Document-Based QA: Exact Answer Selection and Numerical Extraction using Cogtale dataset	Nov 14, 2023	Answer SelectionInformation Retrieval	—Unverified
It's Not Easy Being Wrong: Large Language Models Struggle with Process of Elimination Reasoning	Nov 13, 2023	Multiple-choice	CodeCode Available
Characterizing Large Language Models as Rationalizers of Knowledge-intensive Tasks	Nov 9, 2023	Multiple-choiceWorld Knowledge	—Unverified
Assessing Distractors in Multiple-Choice Tests	Nov 8, 2023	DiversityMultiple-choice	—Unverified
Evaluating multiple large language models in pediatric ophthalmology	Nov 7, 2023	Multiple-choice	—Unverified
Evaluating the Potential of Leading Large Language Models in Reasoning Biology Questions	Nov 5, 2023	Logical ReasoningMultiple-choice	—Unverified
More Robots are Coming: Large Multimodal Models (ChatGPT) can Solve Visually Diverse Images of Parsons Problems	Nov 3, 2023	Multiple-choice	—Unverified
CASE: Commonsense-Augmented Score with an Expanded Answer Space	Nov 3, 2023	Multiple-choice	CodeCode Available
DeSIQ: Towards an Unbiased, Challenging Benchmark for Social Intelligence Understanding	Oct 24, 2023	Language ModelingLanguage Modelling	—Unverified
POE: Process of Elimination for Multiple Choice Reasoning	Oct 24, 2023	In-Context LearningLogical Reasoning	CodeCode Available

Show:10 25 50

← PrevPage 30 of 45Next →

No leaderboard results yet.