SOTAVerified

Multiple-choice

Papers

Showing 701750 of 1107 papers

TitleStatusHype
Is This Collection Worth My LLM's Time? Automatically Measuring Information Potential in Text Corpora0
An Algorithm for Generating Gap-Fill Multiple Choice Questions of an Expert System0
It is Too Many Options: Pitfalls of Multiple-Choice Questions in Generative AI and Medical Education0
Winning Amazon KDD Cup'240
KMMLU: Measuring Massive Multitask Language Understanding in Korean0
Knowledge-Driven Distractor Generation for Cloze-style Multiple Choice Questions0
Knowledge Questions from Knowledge Graphs0
Knowledge Retrieval Based on Generative AI0
KoBALT: Korean Benchmark For Advanced Linguistic Tasks0
KorMedMCQA: Multi-Choice Question Answering Benchmark for Korean Healthcare Professional Licensing Examinations0
KorNAT: LLM Alignment Benchmark for Korean Social Values and Common Knowledge0
KRISTEVA: Close Reading as a Novel Task for Benchmarking Interpretive Reasoning0
LAB-Bench: Measuring Capabilities of Language Models for Biology Research0
LabSafety Bench: Benchmarking LLMs on Safety Issues in Scientific Labs0
Language Enhanced Model for Eye (LEME): An Open-Source Ophthalmology-Specific Large Language Model0
Language models are susceptible to incorrect patient self-diagnosis in medical applications0
Uncovering Cultural Representation Disparities in Vision-Language Models0
Language Models (Mostly) Know What They Know0
Uncovering Temporal Context for Video Question and Answering0
LAR-ECHR: A New Legal Argument Reasoning Task and Dataset for Cases of the European Court of Human Rights0
Large Language Models Are Self-Taught Reasoners: Enhancing LLM Applications via Tailored Problem-Solving Demonstrations0
Large Language Models Could Be Rote Learners0
Understanding Dataset Design Choices for Multi-hop Reasoning0
Large Language Models (GPT) Struggle to Answer Multiple-Choice Questions about Code0
Large Language Models Often Know When They Are Being Evaluated0
Large Language Models Sensitivity to The Order of Options in Multiple-Choice Questions0
Large Language Models Still Exhibit Bias in Long Text0
A Comparative Study of Open-Source Large Language Models, GPT-4 and Claude 2: Multiple-Choice Test Taking in Nephrology0
Understanding Prior Bias and Choice Paralysis in Transformer-based Language Representation Models through Four Experimental Probes0
Learning a Word-Level Language Model with Sentence-Level Noise Contrastive Estimation for Contextual Sentence Probability Estimation0
Learning Language-Visual Embedding for Movie Understanding with Natural-Language0
Learning Models for Actions and Person-Object Interactions with Transfer to Question Answering0
Learning to Specialize with Knowledge Distillation for Visual Question Answering0
An AI-based Solution for Enhancing Delivery of Digital Learning for Future Teachers0
LegalBench.PT: A Benchmark for Portuguese Law0
Teaching Pretrained Models with Commonsense Reasoning: A Preliminary KB-Based Approach0
WIQA: A dataset for ``What if...'' reasoning over procedural text0
LEXam: Benchmarking Legal Reasoning on 340 Law Exams0
LHMKE: A Large-scale Holistic Multi-subject Knowledge Evaluation Benchmark for Chinese Large Language Models0
WirelessMathBench: A Mathematical Modeling Benchmark for LLMs in Wireless Communications0
Linguistic Legal Concept Extraction in Portuguese0
Listening to the Wise Few: Select-and-Copy Attention Heads for Multiple-Choice QA0
LLaMa-SciQ: An Educational Chatbot for Answering Science MCQ0
LLM-as-a-Judge & Reward Model: What They Can and Cannot Do0
LLM-based Text Simplification and its Effect on User Comprehension and Cognitive Load0
LLM Distillation for Efficient Few-Shot Multiple Choice Question Answering0
Unlearning vs. Obfuscation: Are We Truly Removing Knowledge?0
LLM Evaluation Based on Aerospace Manufacturing Expertise: Automated Generation and Multi-Model Question Answering0
Unleashing the Potential of Large Language Model: Zero-shot VQA for Flood Disaster Scenario0
LLMs to Support a Domain Specific Knowledge Assistant0
Show:102550
← PrevPage 15 of 23Next →

No leaderboard results yet.