SOTAVerified

Multiple-choice

Papers

Showing 451500 of 1107 papers

TitleStatusHype
AGReE: A system for generating Automated Grammar Reading Exercises0
Generalization v.s. Memorization: Tracing Language Models' Capabilities Back to Pretraining Data0
ISAAQ -- Mastering Textbook Questions with Pre-trained Transformers and Bottom-Up and Top-Down Attention0
Generating Adequate Distractors for Multiple-Choice Questions0
End-to-end Concept Word Detection for Video Captioning, Retrieval, and Question Answering0
Generating Diagnostic Multiple Choice Comprehension Cloze Questions0
Empowering Large Language Models in Wireless Communication: A Novel Dataset and Fine-Tuning Framework0
Generating multiple-choice questions for medical question answering with distractors and cue-masking0
Beyond VQA: Generating Multi-word Answer and Rationale to Visual Questions0
Generating Questions and Multiple-Choice Answers using Semantic Analysis of Texts0
LLMs May Perform MCQA by Selecting the Least Incorrect Option0
Genome-Bench: A Scientific Reasoning Benchmark from Real-World Expert Discussions0
ELiRF-UPV at SemEval-2018 Task 11: Machine Comprehension using Commonsense Knowledge0
Good, Better, Best: Textual Distractors Generation for Multiple-Choice Visual Question Answering via Reinforcement Learning0
Evaluating Clinical Competencies of Large Language Models with a General Practice Benchmark0
Answer, Assemble, Ace: Understanding How Transformers Answer Multiple Choice Questions0
GPT-4o System Card0
GPT-4 to GPT-3.5: 'Hold My Scalpel' -- A Look at the Competency of OpenAI's GPT on the Plastic Surgery In-Service Training Exam0
Interpretable Multi-Step Reasoning with Knowledge Extraction on Complex Healthcare Question Answering0
ANPMI: Assessing the True Comprehension Capabilities of LLMs for Multiple Choice Questions0
CodeReviewQA: The Code Review Comprehension Assessment for Large Language Models0
GRAF: Graph Retrieval Augmented by Facts for Romanian Legal Multi-Choice Question Answering0
GraphITE: Estimating Individual Effects of Graph-structured Treatments0
Graph-Structured Representations for Visual Question Answering0
Cognitive Biases in Large Language Models: A Survey and Mitigation Experiments0
Is There No Such Thing as a Bad Question? H4R: HalluciBot For Ratiocination, Rewriting, Ranking, and Routing0
Hanfu-Bench: A Multimodal Benchmark on Cross-Temporal Cultural Understanding and Transcreation0
HANS, are you clever? Clever Hans Effect Analysis of Neural Systems0
A Graph-Guided Reasoning Approach for Open-ended Commonsense Question Answering0
Eliciting Categorical Data for Optimal Aggregation0
Eigen Values Features for the Classification of Brain Signals corresponding to 2D and 3D Educational Contents0
Not All Options Are Created Equal: Textual Option Weighting for Token-Efficient LLM-Based Knowledge Tracing0
HATS: Hindi Analogy Test Set for Evaluating Reasoning in Large Language Models0
Have Large Language Models Developed a Personality?: Applicability of Self-Assessment Tests in Measuring Personality in LLMs0
Investigating and Addressing Hallucinations of LLMs in Tasks Involving Negation0
Efficient Knowledge Distillation: Empowering Small Language Models with Teacher Model Insights0
Beyond Profile: From Surface-Level Facts to Deep Persona Simulation in LLMs0
HFL-RC System at SemEval-2018 Task 11: Hybrid Multi-Aspects Model for Commonsense Reading Comprehension0
Hierarchical Divide-and-Conquer for Fine-Grained Alignment in LLM-Based Medical Evaluation0
HindiLLM: Large Language Model for Hindi0
Beyond Probabilities: Unveiling the Misalignment in Evaluating Large Language Models0
A Novel Approach for Constrained Optimization in Graphical Models0
AgMMU: A Comprehensive Agricultural Multimodal Understanding and Reasoning Benchmark0
How Far Can Off-the-Shelf Multimodal Large Language Models Go in Online Episodic Memory Question Answering?0
Edinburgh Clinical NLP at MEDIQA-CORR 2024: Guiding Large Language Models with Hints0
Advanced Financial Reasoning at Scale: A Comprehensive Evaluation of Large Language Models on CFA Level III0
Beyond Multiple Choice: Evaluating Steering Vectors for Adaptive Free-Form Summarization0
How well do LLMs reason over tabular data, really?0
E-Commerce Promotions Personalization via Online Multiple-Choice Knapsack with Uplift Modeling0
Beyond Multiple-Choice Accuracy: Real-World Challenges of Implementing Large Language Models in Healthcare0
Show:102550
← PrevPage 10 of 23Next →

No leaderboard results yet.