SOTAVerified|Agents Browse Leaderboard About

Multiple-choice

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 581–590 of 1107 papers

Title	Date	Tasks	Status	Hype
CyberMetric: A Benchmark Dataset based on Retrieval-Augmented Generation for Evaluating LLMs in Cybersecurity Knowledge	Feb 12, 2024	General KnowledgeMultiple-choice	CodeCode Available	2
The Effect of Sampling Temperature on Problem Solving in Large Language Models	Feb 7, 2024	Multiple-choicePrompt Engineering	CodeCode Available	1
Prompting Implicit Discourse Relation Annotation	Feb 7, 2024	ClassificationImplicit Discourse Relation Classification	—Unverified	0
SALAD-Bench: A Hierarchical and Comprehensive Safety Benchmark for Large Language Models	Feb 7, 2024	DiversityMultiple-choice	CodeCode Available	2
SceMQA: A Scientific College Entrance Level Multimodal Question Answering Benchmark	Feb 6, 2024	Multiple-choiceQuestion Answering	—Unverified	0
Are Machines Better at Complex Reasoning? Unveiling Human-Machine Inference Gaps in Entailment Verification	Feb 6, 2024	BenchmarkingMultiple-choice	—Unverified	0
SHIELD : An Evaluation Benchmark for Face Spoofing and Forgery Detection with Multimodal Large Language Models	Feb 6, 2024	AttributeFace Anti-Spoofing	CodeCode Available	1
Enhancing textual textbook question answering with large language models and retrieval augmented generation	Feb 5, 2024	Multiple-choiceQuestion Answering	CodeCode Available	0
LLMs May Perform MCQA by Selecting the Least Incorrect Option	Feb 2, 2024	Multiple-choiceMultiple Choice Question Answering (MCQA)	—Unverified	0
Distractor Generation in Multiple-Choice Tasks: A Survey of Methods, Datasets, and Evaluation	Feb 2, 2024	Distractor GenerationMultiple-choice	—Unverified	0

Show:10 25 50

← PrevPage 59 of 111Next →

No leaderboard results yet.