SOTAVerified

Multiple-choice

Papers

Showing 751800 of 1107 papers

TitleStatusHype
Dataset Bias Mitigation in Multiple-Choice Visual Question Answering and Beyond0
StoryAnalogy: Deriving Story-level Analogies from Large Language Models to Unlock Analogical UnderstandingCode0
Investigating Uncertainty Calibration of Aligned Language Models under the Multiple-Choice Setting0
Field-testing items using artificial intelligence: Natural language processing with transformers0
Evaluating the Symbol Binding Ability of Large Language Models for Multiple-Choice Questions in Vietnamese General Education0
KGQuiz: Evaluating the Generalization of Encoded Knowledge in Large Language ModelsCode0
Mitigating Bias for Question Answering Models by Tracking Bias Influence0
Analyzing Zero-Shot Abilities of Vision-Language Models on Video Understanding Tasks0
On the Performance of Multimodal Language Models0
Language Models as Knowledge Bases for Visual Word Sense DisambiguationCode0
AutoCast++: Enhancing World Event Prediction with Zero-shot Ranking-based Context RetrievalCode0
Can Large Language Models Provide Security & Privacy Advice? Measuring the Ability of LLMs to Refute MisconceptionsCode0
Fusing Models with Complementary ExpertiseCode0
Automating question generation from educational text0
HANS, are you clever? Clever Hans Effect Analysis of Neural Systems0
Benchmarks for Pirá 2.0, a Reading Comprehension Dataset about the Ocean, the Brazilian Coast, and Climate Change0
Exploring Iterative Enhancement for Improving Learnersourced Multiple-Choice Question Explanations with Large Language ModelsCode0
Language models are susceptible to incorrect patient self-diagnosis in medical applications0
Self-Assessment Tests are Unreliable Measures of LLM Personality0
Use neural networks to recognize students' handwritten letters and incorrect symbols0
Performance of ChatGPT-3.5 and GPT-4 on the United States Medical Licensing Examination With and Without Distractions0
INCEPTNET: Precise And Early Disease Detection Application For Medical Images AnalysesCode0
An Automatic Evaluation Framework for Multi-turn Medical Consultations Capabilities of Large Language Models0
Generalised Winograd Schema and its Contextuality0
Spoken Language Intelligence of Large Language Models for Language LearningCode0
Large Language Models Sensitivity to The Order of Options in Multiple-Choice Questions0
A Comparative Study of Open-Source Large Language Models, GPT-4 and Claude 2: Multiple-Choice Test Taking in Nephrology0
Automated Distractor and Feedback Generation for Math Multiple-choice Questions via In-context LearningCode0
ChatGPT for GTFS: Benchmarking LLMs on GTFS Understanding and RetrievalCode0
ReCoMIF: Reading comprehension based multi-source information fusion network for Chinese spoken language understandingCode0
Distractor generation for multiple-choice questions with predictive prompting and large language modelsCode0
A large language model-assisted education tool to provide feedback on open-ended responsesCode0
Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla0
Assessing the Quality of Multiple-Choice Questions Using GPT-4 and Rule-Based MethodsCode0
Analyzing Multiple-Choice Reading and Listening Comprehension Tests0
Chance-Constrained Multiple-Choice Knapsack Problem: Model, Algorithms, and ApplicationsCode0
Structured Dialogue Discourse ParsingCode0
Analysis of the Cambridge Multiple-Choice Questions Reading Dataset with a Focus on Candidate Response Distribution0
Solving and Generating NPR Sunday Puzzles with Large Language ModelsCode0
RECAP-KG: Mining Knowledge Graphs from Raw GP Notes for Remote COVID-19 Assessment in Primary Care0
Thrilled by Your Progress! Large Language Models (GPT-4) No Longer Struggle to Pass Assessments in Higher Education Programming Courses0
Can ChatGPT pass the Vietnamese National High School Graduation Examination?0
Questioning the Survey Responses of Large Language ModelsCode0
Investigating the Effectiveness of ChatGPT in Mathematical Reasoning and Problem Solving: Evidence from the Vietnamese National High School Graduation Examination0
Network-based Representations and Dynamic Discrete Choice Models for Multiple Discrete Choice Analysis0
BUCA: A Binary Classification Approach to Unsupervised Commonsense Question AnsweringCode0
Increasing Probability Mass on Answer Choices Does Not Always Improve AccuracyCode0
Have Large Language Models Developed a Personality?: Applicability of Self-Assessment Tests in Measuring Personality in LLMs0
ToMChallenges: A Principle-Guided Dataset and Diverse Evaluation Tasks for Exploring Theory of MindCode0
This Land is Your, My Land: Evaluating Geopolitical Biases in Language ModelsCode0
Show:102550
← PrevPage 16 of 23Next →

No leaderboard results yet.