SOTAVerified

Multiple-choice

Papers

Showing 576600 of 1107 papers

TitleStatusHype
Recent Advances in Multi-Choice Machine Reading Comprehension: A Survey on Methods and Datasets0
Improved Few-Shot Image Classification Through Multiple-Choice Questions0
Do LLMs Know When to NOT Answer? Investigating Abstention Abilities of Large Language Models0
MIBench: Evaluating Multimodal Large Language Models over Multiple Images0
Answer, Assemble, Ace: Understanding How Transformers Answer Multiple Choice Questions0
Modular Sentence Encoders: Separating Language Specialization from Cross-Lingual AlignmentCode0
Generalization v.s. Memorization: Tracing Language Models' Capabilities Back to Pretraining Data0
Adversarial Databases Improve Success in Retrieval-based Large Language Models0
MINI-LLM: Memory-Efficient Structured Pruning for Large Language Models0
NTSEBENCH: Cognitive Reasoning Benchmark for Vision Language Models0
AstroMLab 1: Who Wins Astronomy Jeopardy!?0
LAB-Bench: Measuring Capabilities of Language Models for Biology Research0
Leveraging large language models for nano synthesis mechanism explanation: solid foundations or mere conjectures?Code0
Evaluating Nuanced Bias in Large Language Model Free Response Answers0
Self-Recognition in Language ModelsCode0
Can Model Uncertainty Function as a Proxy for Multiple-Choice Question Item Difficulty?Code0
Are Large Language Models Consistent over Value-laden Questions?Code0
Is Your Large Language Model Knowledgeable or a Choices-Only Cheater?Code0
CFinBench: A Comprehensive Chinese Financial Benchmark for Large Language Models0
DiVERT: Distractor Generation with Variational Errors Represented as Text for Math Multiple-choice QuestionsCode0
Changing Answer Order Can Decrease MMLU Accuracy0
Length Optimization in Conformal PredictionCode0
VarBench: Robust Language Model Benchmarking Through Dynamic Variable PerturbationCode0
Evaluating Visual and Cultural Interpretation: The K-Viscuit Benchmark with Human-VLM Collaboration0
SynDARin: Synthesising Datasets for Automated Reasoning in Low-Resource Languages0
Show:102550
← PrevPage 24 of 45Next →

No leaderboard results yet.