SOTAVerified

Multiple-choice

Papers

Showing 676700 of 1107 papers

TitleStatusHype
LHMKE: A Large-scale Holistic Multi-subject Knowledge Evaluation Benchmark for Chinese Large Language Models0
Enhancing Event Causality Identification with Rationale and Structure-Aware Causal Question Answering0
Few-Shot Image Classification and Segmentation as Visual Question Answering Using Vision-Language Models0
EXAMS-V: A Multi-Discipline Multilingual Multimodal Exam Benchmark for Evaluating Vision Language ModelsCode0
Towards Diverse Perspective Learning with Selection over Multiple Temporal PoolingsCode0
Exploring the Comprehension of ChatGPT in Traditional Chinese Medicine Knowledge0
AraTrust: An Evaluation of Trustworthiness for LLMs in Arabic0
Rethinking Generative Large Language Model Evaluation for Semantic Comprehension0
MedKP: Medical Dialogue with Knowledge Enhancement and Clinical Pathway Encoding0
Automated Generation of Multiple-Choice Cloze Questions for Assessing English Vocabulary Using GPT-turbo 3.50
An Improved Traditional Chinese Evaluation Suite for Foundation Model0
KorMedMCQA: Multi-Choice Question Answering Benchmark for Korean Healthcare Professional Licensing Examinations0
Controlling Cloze-test Question Item Difficulty with PLM-based Surrogate Models for IRT Assessment0
Predictions from language models for multiple-choice tasks are not robust under variation of scoring methods0
Unsupervised multiple choices question answering via universal corpus0
Biomedical Entity Linking as Multiple Choice Question AnsweringCode0
"My Answer is C": First-Token Probabilities Do Not Match Text Answers in Instruction-Tuned Language ModelsCode0
Identifying Multiple Personalities in Large Language Models with External Evaluation0
Beyond Probabilities: Unveiling the Misalignment in Evaluating Large Language Models0
Ranking Large Language Models without Ground Truth0
KorNAT: LLM Alignment Benchmark for Korean Social Values and Common Knowledge0
Probabilities of Chat LLMs Are Miscalibrated but Still Predict Correctness on Multiple-Choice Q&ACode0
Digital Comprehensibility Assessment of Simplified Texts among Persons with Intellectual Disabilities0
Artifacts or Abduction: How Do LLMs Answer Multiple-Choice Questions Without the Question?Code0
Stick to your Role! Stability of Personal Values Expressed in Large Language Models0
Show:102550
← PrevPage 28 of 45Next →

No leaderboard results yet.