SOTAVerified

Multiple-choice

Papers

Showing 391400 of 1107 papers

TitleStatusHype
MCQA-Eval: Efficient Confidence Evaluation in NLG with Gold-Standard Correctness Labels0
VITAL: A New Dataset for Benchmarking Pluralistic Alignment in Healthcare0
Which of These Best Describes Multiple Choice Evaluation with LLMs? A) Forced B) Flawed C) Fixable D) All of the Above0
Instruction Tuning on Public Government and Cultural Data for Low-Resource Language: a Case Study in Kazakh0
Is This Collection Worth My LLM's Time? Automatically Measuring Information Potential in Text Corpora0
Towards Geo-Culturally Grounded LLM Generations0
OCCULT: Evaluating Large Language Models for Offensive Cyber Operation Capabilities0
None of the Others: a General Technique to Distinguish Reasoning from Memorization in Multiple-Choice LLM Evaluation Benchmarks0
Beyond Profile: From Surface-Level Facts to Deep Persona Simulation in LLMs0
Multi-Modal Retrieval Augmentation for Open-Ended and Knowledge-Intensive Video Question Answering0
Show:102550
← PrevPage 40 of 111Next →

No leaderboard results yet.