SOTAVerified

Multiple-choice

Papers

Showing 476500 of 1107 papers

TitleStatusHype
Automated Generation and Tagging of Knowledge Components from Multiple-Choice QuestionsCode0
DGRC: An Effective Fine-tuning Framework for Distractor Generation in Chinese Multi-choice Reading Comprehension0
Edinburgh Clinical NLP at MEDIQA-CORR 2024: Guiding Large Language Models with Hints0
Can We Trust LLMs? Mitigate Overconfidence Bias in LLMs through Knowledge Transfer0
iREL at SemEval-2024 Task 9: Improving Conventional Prompting Methods for Brain TeasersCode0
Eliciting Informative Text Evaluations with Large Language ModelsCode0
Imagery as Inquiry: Exploring A Multimodal Dataset for Conversational Recommendation0
Automated Evaluation of Retrieval-Augmented Language Models with Task-Specific Exam GenerationCode2
Embedding Trajectory for Out-of-Distribution Detection in Mathematical ReasoningCode1
Robust portfolio optimization model for electronic coupon allocation0
Multiple-Choice Questions are Efficient and Robust LLM EvaluatorsCode1
Exploring the Capabilities of Prompted Large Language Models in Educational and Assessment Applications0
From Generalist to Specialist: Improving Large Language Models for Medical Physics Using ARCoT0
Benchmarking Large Language Models on CFLUE -- A Chinese Financial Language Understanding Evaluation DatasetCode3
COGNET-MD, an evaluation framework and dataset for Large Language Model benchmarks in the medical domain0
AmazUtah_NLP at SemEval-2024 Task 9: A MultiChoice Question Answering System for Commonsense Defying Reasoning0
CinePile: A Long Video Question Answering Dataset and Benchmark0
SciFIBench: Benchmarking Large Multimodal Models for Scientific Figure InterpretationCode1
MCS-SQL: Leveraging Multiple Prompts and Multiple-Choice Selection For Text-to-SQL Generation0
Limited Ability of LLMs to Simulate Human Psychological Behaviours: a Psychometric AnalysisCode0
THRONE: An Object-based Hallucination Benchmark for the Free-form Generations of Large Vision-Language ModelsCode1
WorldQA: Multimodal World Knowledge in Videos through Long-Chain Reasoning0
Anchored Answers: Unravelling Positional Bias in GPT-2's Multiple-Choice QuestionsCode0
Self-Reflection in LLM Agents: Effects on Problem-Solving PerformanceCode2
Math Multiple Choice Question Generation via Human-Large Language Model Collaboration0
Show:102550
← PrevPage 20 of 45Next →

No leaderboard results yet.