SOTAVerified

Multiple-choice

Papers

Showing 9511000 of 1107 papers

TitleStatusHype
AILS-NTUA at SemEval-2024 Task 9: Cracking Brain Teasers: Transformer Models for Lateral Thinking PuzzlesCode0
DyePack: Provably Flagging Test Set Contamination in LLMs Using BackdoorsCode0
EAIRA: Establishing a Methodology for Evaluating AI Models as Scientific Research AssistantsCode0
MMM: Multi-stage Multi-task Learning for Multi-choice Reading ComprehensionCode0
Anchored Answers: Unravelling Positional Bias in GPT-2's Multiple-Choice QuestionsCode0
MM-PoE: Multiple Choice Reasoning via. Process of Elimination using Multi-Modal ModelsCode0
Pragmatic Competence Evaluation of Large Language Models for the Korean LanguageCode0
Which is the Effective Way for Gaokao: Information Retrieval or Neural Networks?Code0
Edu-Values: Towards Evaluating the Chinese Education Values of Large Language ModelsCode0
Investigating the Shortcomings of LLMs in Step-by-Step Legal ReasoningCode0
Precise Task Formalization Matters in Winograd Schema EvaluationsCode0
Towards a Unified Multimodal Reasoning FrameworkCode0
IPEval: A Bilingual Intellectual Property Agency Consultation Evaluation Benchmark for Large Language ModelsCode0
iREL at SemEval-2024 Task 9: Improving Conventional Prompting Methods for Brain TeasersCode0
Eliciting Informative Text Evaluations with Large Language ModelsCode0
ElimiNet: A Model for Eliminating Options for Reading Comprehension with Multiple Choice QuestionsCode0
Self-Recognition in Language ModelsCode0
EMBRACE: Evaluation and Modifications for Boosting RACECode0
Can multiple-choice questions really be useful in detecting the abilities of LLMs?Code0
Modular Sentence Encoders: Separating Language Specialization from Cross-Lingual AlignmentCode0
Increasing Probability Mass on Answer Choices Does Not Always Improve AccuracyCode0
Is Your Large Language Model Knowledgeable or a Choices-Only Cheater?Code0
Iterative Forward Tuning Boosts In-Context Learning in Language ModelsCode0
Can We Guide a Multi-Hop Reasoning Language Model to Incrementally Learn at Each Single-Hop?Code0
BnMMLU: Measuring Massive Multitask Language Understanding in BengaliCode0
It's Not Easy Being Wrong: Large Language Models Struggle with Process of Elimination ReasoningCode0
Investigating Prior Knowledge for Challenging Chinese Machine Reading ComprehensionCode0
Joint Learning of Sentence Embeddings for Relevance and EntailmentCode0
Enhancing textual textbook question answering with large language models and retrieval augmented generationCode0
Kaleidoscope: In-language Exams for Massively Multilingual Vision EvaluationCode0
KGQuiz: Evaluating the Generalization of Encoded Knowledge in Large Language ModelsCode0
AutoCast++: Enhancing World Event Prediction with Zero-shot Ranking-based Context RetrievalCode0
Moving Beyond Medical Exam Questions: A Clinician-Annotated Dataset of Real-World Tasks and Ambiguity in Mental HealthcareCode0
Uncertainty quantification in fine-tuned LLMs using LoRA ensemblesCode0
Evaluating and Mitigating Social Bias for Large Language Models in Open-ended SettingsCode0
Evaluating ChatGPT-4 Vision on Brazil's National Undergraduate Computer Science ExamCode0
VCRBench: Exploring Long-form Causal Reasoning Capabilities of Large Video Language ModelsCode0
Towards Democratizing Multilingual Large Language Models For Medicine Through A Two-Stage Instruction Fine-tuning ApproachCode0
Evaluating Large Language Model Biases in Persona-Steered GenerationCode0
SeqSAM: Autoregressive Multiple Hypothesis Prediction for Medical Image Segmentation using SAMCode0
SCoRE: Benchmarking Long-Chain Reasoning in Commonsense ScenariosCode0
M-QALM: A Benchmark to Assess Clinical Reading Comprehension and Knowledge Recall in Large Language Models via Question AnsweringCode0
Order-Independence Without Fine TuningCode0
Towards Diverse Perspective Learning with Selection over Multiple Temporal PoolingsCode0
PROST: Physical Reasoning of Objects through Space and TimeCode0
VEGAS: Towards Visually Explainable and Grounded Artificial Social IntelligenceCode0
Evaluating Prompts Across Multiple Choice Tasks In a Zero-Shot SettingCode0
This Land is Your, My Land: Evaluating Geopolitical Biases in Language ModelsCode0
Evaluating the Instruction-following Abilities of Language Models using Knowledge TasksCode0
Multi-class Hierarchical Question Classification for Multiple Choice Science ExamsCode0
Show:102550
← PrevPage 20 of 23Next →

No leaderboard results yet.