SOTAVerified

Multiple-choice

Papers

Showing 526550 of 1107 papers

TitleStatusHype
Latxa: An Open Language Model and Evaluation Suite for BasqueCode1
Non-Linear Inference Time Intervention: Improving LLM TruthfulnessCode1
BioMedLM: A 2.7B Parameter Language Model Trained On Biomedical TextCode4
An Image Grid Can Be Worth a Video: Zero-shot Video Question Answering Using a VLMCode2
Can multiple-choice questions really be useful in detecting the abilities of LLMs?Code0
PCToolkit: A Unified Plug-and-Play Prompt Compression Toolkit of Large Language ModelsCode3
Understanding Long Videos with Multimodal Language ModelsCode2
IllusionVQA: A Challenging Optical Illusion Dataset for Vision Language ModelsCode1
Pragmatic Competence Evaluation of Large Language Models for the Korean LanguageCode0
LHMKE: A Large-scale Holistic Multi-subject Knowledge Evaluation Benchmark for Chinese Large Language Models0
Enhancing Event Causality Identification with Rationale and Structure-Aware Causal Question Answering0
Few-Shot Image Classification and Segmentation as Visual Question Answering Using Vision-Language Models0
EXAMS-V: A Multi-Discipline Multilingual Multimodal Exam Benchmark for Evaluating Vision Language ModelsCode0
Towards Diverse Perspective Learning with Selection over Multiple Temporal PoolingsCode0
AraTrust: An Evaluation of Trustworthiness for LLMs in Arabic0
Exploring the Comprehension of ChatGPT in Traditional Chinese Medicine Knowledge0
Rethinking Generative Large Language Model Evaluation for Semantic Comprehension0
Complex Reasoning over Logical Queries on Commonsense Knowledge GraphsCode1
MedKP: Medical Dialogue with Knowledge Enhancement and Clinical Pathway Encoding0
Unfamiliar Finetuning Examples Control How Language Models HallucinateCode1
The WMDP Benchmark: Measuring and Reducing Malicious Use With UnlearningCode4
An Improved Traditional Chinese Evaluation Suite for Foundation Model0
Automated Generation of Multiple-Choice Cloze Questions for Assessing English Vocabulary Using GPT-turbo 3.50
To Generate or to Retrieve? On the Effectiveness of Artificial Contexts for Medical Open-Domain Question AnsweringCode1
KorMedMCQA: Multi-Choice Question Answering Benchmark for Korean Healthcare Professional Licensing Examinations0
Show:102550
← PrevPage 22 of 45Next →

No leaderboard results yet.