SOTAVerified|Agents Browse Leaderboard About

Multiple-choice

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 621–630 of 1107 papers

Title	Date	Tasks	Status	Hype
Investigating and Addressing Hallucinations of LLMs in Tasks Involving Negation	Jun 8, 2024	Abstractive Text SummarizationDialogue Generation	—Unverified	0
Do LLMs Recognize me, When I is not me: Assessment of LLMs Understanding of Turkish Indexical Pronouns in Indexical Shift Contexts	Jun 8, 2024	Machine TranslationMultiple-choice	—Unverified	0
CRiskEval: A Chinese Multi-Level Risk Evaluation Benchmark Dataset for Large Language Models	Jun 7, 2024	Multiple-choicePhilosophy	CodeCode Available	0
LLMs Are Not Intelligent Thinkers: Introducing Mathematical Topic Tree Benchmark for Comprehensive Evaluation of LLMs	Jun 7, 2024	Mathematical ReasoningMultiple-choice	CodeCode Available	0
Every Answer Matters: Evaluating Commonsense with Probabilistic Measures	Jun 6, 2024	Common Sense ReasoningLanguage Modeling	CodeCode Available	0
M-QALM: A Benchmark to Assess Clinical Reading Comprehension and Knowledge Recall in Large Language Models via Question Answering	Jun 6, 2024	abstractive question answeringClinical Knowledge	CodeCode Available	0
Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?	Jun 6, 2024	Multiple-choiceQuestion Answering	—Unverified	0
Automating Turkish Educational Quiz Generation Using Large Language Models	Jun 5, 2024	Multiple-choice	CodeCode Available	0
Multiple Choice Questions and Large Languages Models: A Case Study with Fictional Medical Data	Jun 4, 2024	Clinical KnowledgeMultiple-choice	CodeCode Available	0
Order-Independence Without Fine Tuning	Jun 4, 2024	Language ModellingMultiple-choice	CodeCode Available	0

Show:10 25 50

← PrevPage 63 of 111Next →

No leaderboard results yet.