SOTAVerified

Multiple-choice

Papers

Showing 501525 of 1107 papers

TitleStatusHype
Humans and Large Language Models in Clinical Decision Support: A Study with Medical Calculators0
E-Commerce Promotions Personalization via Online Multiple-Choice Knapsack with Uplift Modeling0
Hypothesis Testing for Quantifying LLM-Human Misalignment in Multiple Choice Settings0
Identification of mental fatigue in language comprehension tasks based on EEG and deep learning0
Treatment Effects with Multidimensional Unobserved Heterogeneity: Identification of the Marginal Treatment Effect0
Identifying Multiple Personalities in Large Language Models with External Evaluation0
Contextual Response Interpretation for Automated Structured Interviews: A Case Study in Market Research0
Identity Lock: Locking API Fine-tuned LLMs With Identity-based Wake Words0
IIE-NLP-Eyas at SemEval-2021 Task 4: Enhancing PLM for ReCAM with Special Tokens, Re-Ranking, Siamese Encoders and Back Translation0
IIE-NLP-NUT at SemEval-2020 Task 4: Guiding PLM with Prompt Template Reconstruction Strategy for ComVE0
Beyond Multiple-Choice Accuracy: Real-World Challenges of Implementing Large Language Models in Healthcare0
ACPBench: Reasoning about Action, Change, and Planning0
E-cheating Prevention Measures: Detection of Cheating at Online Examinations Using Deep Learning Approach -- A Case Study0
Better Distractions: Transformer-based Distractor Generation and Multiple Choice Question Filtering0
KorMedMCQA: Multi-Choice Question Answering Benchmark for Korean Healthcare Professional Licensing Examinations0
Dual Co-Matching Network for Multi-choice Reading Comprehension0
ACPBench Hard: Unrestrained Reasoning about Action, Change, and Planning0
KorNAT: LLM Alignment Benchmark for Korean Social Values and Common Knowledge0
DsMCL: Dual-Level Stochastic Multiple Choice Learning for Multi-Modal Trajectory Prediction0
DRIVINGVQA: Analyzing Visual Chain-of-Thought Reasoning of Vision Language Models in Real-World Scenarios with Driving Theory Tests0
AGenT Zero: Zero-shot Automatic Multiple-Choice Question Generation for Skill Assessments0
DREAM: A Challenge Data Set and Models for Dialogue-Based Reading Comprehension0
AfriMed-QA: A Pan-African, Multi-Specialty, Medical Question-Answering Benchmark Dataset0
KoBALT: Korean Benchmark For Advanced Linguistic Tasks0
KRISTEVA: Close Reading as a Novel Task for Benchmarking Interpretive Reasoning0
Show:102550
← PrevPage 21 of 45Next →

No leaderboard results yet.