SOTAVerified

Multiple-choice

Papers

Showing 626650 of 1107 papers

TitleStatusHype
M-QALM: A Benchmark to Assess Clinical Reading Comprehension and Knowledge Recall in Large Language Models via Question AnsweringCode0
Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?0
Automating Turkish Educational Quiz Generation Using Large Language ModelsCode0
Multiple Choice Questions and Large Languages Models: A Case Study with Fictional Medical DataCode0
Order-Independence Without Fine TuningCode0
Strengthened Symbol Binding Makes Large Language Models Reliable Multiple-Choice SelectorsCode0
Explore then Determine: A GNN-LLM Synergy Framework for Reasoning over Knowledge Graph0
Student Answer Forecasting: Transformer-Driven Answer Choice Prediction for Language LearningCode0
An Automatic Question Usability Evaluation ToolkitCode0
Evaluating Large Language Model Biases in Persona-Steered GenerationCode0
Automated Generation and Tagging of Knowledge Components from Multiple-Choice QuestionsCode0
DGRC: An Effective Fine-tuning Framework for Distractor Generation in Chinese Multi-choice Reading Comprehension0
Edinburgh Clinical NLP at MEDIQA-CORR 2024: Guiding Large Language Models with Hints0
Can We Trust LLMs? Mitigate Overconfidence Bias in LLMs through Knowledge Transfer0
iREL at SemEval-2024 Task 9: Improving Conventional Prompting Methods for Brain TeasersCode0
Eliciting Informative Text Evaluations with Large Language ModelsCode0
Imagery as Inquiry: Exploring A Multimodal Dataset for Conversational Recommendation0
Robust portfolio optimization model for electronic coupon allocation0
Exploring the Capabilities of Prompted Large Language Models in Educational and Assessment Applications0
COGNET-MD, an evaluation framework and dataset for Large Language Model benchmarks in the medical domain0
From Generalist to Specialist: Improving Large Language Models for Medical Physics Using ARCoT0
AmazUtah_NLP at SemEval-2024 Task 9: A MultiChoice Question Answering System for Commonsense Defying Reasoning0
CinePile: A Long Video Question Answering Dataset and Benchmark0
MCS-SQL: Leveraging Multiple Prompts and Multiple-Choice Selection For Text-to-SQL Generation0
Limited Ability of LLMs to Simulate Human Psychological Behaviours: a Psychometric AnalysisCode0
Show:102550
← PrevPage 26 of 45Next →

No leaderboard results yet.