SOTAVerified

Multiple-choice

Papers

Showing 501550 of 1107 papers

TitleStatusHype
Humans and Large Language Models in Clinical Decision Support: A Study with Medical Calculators0
ACPBench Hard: Unrestrained Reasoning about Action, Change, and Planning0
DsMCL: Dual-Level Stochastic Multiple Choice Learning for Multi-Modal Trajectory Prediction0
Identification of mental fatigue in language comprehension tasks based on EEG and deep learning0
Treatment Effects with Multidimensional Unobserved Heterogeneity: Identification of the Marginal Treatment Effect0
Identifying Multiple Personalities in Large Language Models with External Evaluation0
Contextual Response Interpretation for Automated Structured Interviews: A Case Study in Market Research0
Identity Lock: Locking API Fine-tuned LLMs With Identity-based Wake Words0
IIE-NLP-Eyas at SemEval-2021 Task 4: Enhancing PLM for ReCAM with Special Tokens, Re-Ranking, Siamese Encoders and Back Translation0
IIE-NLP-NUT at SemEval-2020 Task 4: Guiding PLM with Prompt Template Reconstruction Strategy for ComVE0
DRIVINGVQA: Analyzing Visual Chain-of-Thought Reasoning of Vision Language Models in Real-World Scenarios with Driving Theory Tests0
AGenT Zero: Zero-shot Automatic Multiple-Choice Question Generation for Skill Assessments0
DREAM: A Challenge Data Set and Models for Dialogue-Based Reading Comprehension0
AfriMed-QA: A Pan-African, Multi-Specialty, Medical Question-Answering Benchmark Dataset0
DP-SSL: Towards Robust Semi-supervised Learning with A Few Labeled Samples0
Do LLMs Recognize me, When I is not me: Assessment of LLMs Understanding of Turkish Indexical Pronouns in Indexical Shift Contexts0
Benchmarks for Pirá 2.0, a Reading Comprehension Dataset about the Ocean, the Brazilian Coast, and Climate Change0
Do LLMs Make Mistakes Like Students? Exploring Natural Alignment between Language Models and Human Error Patterns0
Do LLMs Know When to NOT Answer? Investigating Abstention Abilities of Large Language Models0
Benchmarking Next-Generation Reasoning-Focused Large Language Models in Ophthalmology: A Head-to-Head Evaluation on 5,888 Items0
Do LLMs Act as Repositories of Causal Knowledge?0
Do Large Language Models Know Folktales? A Case Study of Yokai in Japanese Folktales0
Do Fine-tuned Commonsense Language Models Really Generalize?0
An MRC Framework for Semantic Role Labeling0
Linguistic Legal Concept Extraction in Portuguese0
LMVE at SemEval-2020 Task 4: Commonsense Validation and Explanation using Pretraining Language Model0
Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla0
Benchmarking Bias in Large Language Models during Role-Playing0
Document-level Event Factuality Identification via Machine Reading Comprehension Frameworks with Transfer Learning0
DMind Benchmark: Toward a Holistic Assessment of LLM Capabilities across the Web3 Domain0
A Corpus of Text Data and Gaze Fixations from Autistic and Non-Autistic Adults0
Large Language Models Still Exhibit Bias in Long Text0
DiverseNet: When One Right Answer is not Enough0
Being Negative but Constructively: Lessons Learnt from Creating Better Visual Question Answering Datasets0
Learning a Word-Level Language Model with Sentence-Level Noise Contrastive Estimation for Contextual Sentence Probability Estimation0
Distributional semantics beyond words: Supervised learning of analogy and paraphrase0
Distractor Generation in Multiple-Choice Tasks: A Survey of Methods, Datasets, and Evaluation0
Bayesian Statistical Modeling with Predictors from LLMs0
A Weak Supervision Approach for Predicting Difficulty of Technical Interview Questions0
Large Language Models (GPT) Struggle to Answer Multiple-Choice Questions about Code0
Large Language Models Often Know When They Are Being Evaluated0
Distractor Analysis and Selection for Multiple-Choice Cloze Questions for Second-Language Learners0
DISTO: Evaluating Textual Distractors for Multi-Choice Questions using Negative Sampling based Approach0
Auxiliary Class Based Multiple Choice Learning0
Disaggregating Hops: Can We Guide a Multi-Hop Reasoning Language Model to Incrementally Learn at each Hop?0
An Improved Traditional Chinese Evaluation Suite for Foundation Model0
A Foundational Multimodal Vision Language AI Assistant for Human Pathology0
Large Language Models Sensitivity to The Order of Options in Multiple-Choice Questions0
Learning Language-Visual Embedding for Movie Understanding with Natural-Language0
Digital Comprehensibility Assessment of Simplified Texts among Persons with Intellectual Disabilities0
Show:102550
← PrevPage 11 of 23Next →

No leaderboard results yet.