SOTAVerified

Multiple-choice

Papers

Showing 951975 of 1107 papers

TitleStatusHype
AILS-NTUA at SemEval-2024 Task 9: Cracking Brain Teasers: Transformer Models for Lateral Thinking PuzzlesCode0
DyePack: Provably Flagging Test Set Contamination in LLMs Using BackdoorsCode0
EAIRA: Establishing a Methodology for Evaluating AI Models as Scientific Research AssistantsCode0
MMM: Multi-stage Multi-task Learning for Multi-choice Reading ComprehensionCode0
Anchored Answers: Unravelling Positional Bias in GPT-2's Multiple-Choice QuestionsCode0
MM-PoE: Multiple Choice Reasoning via. Process of Elimination using Multi-Modal ModelsCode0
Pragmatic Competence Evaluation of Large Language Models for the Korean LanguageCode0
Which is the Effective Way for Gaokao: Information Retrieval or Neural Networks?Code0
Edu-Values: Towards Evaluating the Chinese Education Values of Large Language ModelsCode0
Investigating the Shortcomings of LLMs in Step-by-Step Legal ReasoningCode0
Precise Task Formalization Matters in Winograd Schema EvaluationsCode0
Towards a Unified Multimodal Reasoning FrameworkCode0
IPEval: A Bilingual Intellectual Property Agency Consultation Evaluation Benchmark for Large Language ModelsCode0
iREL at SemEval-2024 Task 9: Improving Conventional Prompting Methods for Brain TeasersCode0
Eliciting Informative Text Evaluations with Large Language ModelsCode0
ElimiNet: A Model for Eliminating Options for Reading Comprehension with Multiple Choice QuestionsCode0
Self-Recognition in Language ModelsCode0
EMBRACE: Evaluation and Modifications for Boosting RACECode0
Can multiple-choice questions really be useful in detecting the abilities of LLMs?Code0
Modular Sentence Encoders: Separating Language Specialization from Cross-Lingual AlignmentCode0
Increasing Probability Mass on Answer Choices Does Not Always Improve AccuracyCode0
Is Your Large Language Model Knowledgeable or a Choices-Only Cheater?Code0
Iterative Forward Tuning Boosts In-Context Learning in Language ModelsCode0
Can We Guide a Multi-Hop Reasoning Language Model to Incrementally Learn at Each Single-Hop?Code0
BnMMLU: Measuring Massive Multitask Language Understanding in BengaliCode0
Show:102550
← PrevPage 39 of 45Next →

No leaderboard results yet.