Multiple-choice

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 651–700 of 1107 papers

Title	Date	Tasks	Status
Anchored Answers: Unravelling Positional Bias in GPT-2's Multiple-Choice Questions	May 6, 2024	Decision MakingMultiple-choice	CodeCode Available
WorldQA: Multimodal World Knowledge in Videos through Long-Chain Reasoning	May 6, 2024	Multiple-choiceVideo Understanding	—Unverified
Math Multiple Choice Question Generation via Human-Large Language Model Collaboration	May 1, 2024	Language ModelingLanguage Modelling	—Unverified
FoundaBench: Evaluating Chinese Fundamental Knowledge Capabilities of Large Language Models	Apr 29, 2024	Common Sense ReasoningMultiple-choice	—Unverified
From Multiple-Choice to Extractive QA: A Case Study for English and Arabic	Apr 26, 2024	BelebeleExtractive Question-Answering	CodeCode Available
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites	Apr 25, 2024	4kLanguage Modeling	—Unverified
TAXI: Evaluating Categorical Knowledge Editing for Language Models	Apr 23, 2024	knowledge editingMultiple-choice	CodeCode Available
AI and Machine Learning for Next Generation Science Assessments	Apr 23, 2024	Multiple-choice	—Unverified
UnibucLLM: Harnessing LLMs for Automated Prediction of Item Difficulty and Response Time for Multiple-Choice Questions	Apr 20, 2024	Data AugmentationMultiple-choice	CodeCode Available
Improving Automated Distractor Generation for Math Multiple-choice Questions with Overgenerate-and-rank	Apr 19, 2024	Distractor GenerationMath	—Unverified
Is There No Such Thing as a Bad Question? H4R: HalluciBot For Ratiocination, Rewriting, Ranking, and Routing	Apr 18, 2024	HallucinationMultiple-choice	—Unverified
BLINK: Multimodal Large Language Models Can See but Not Perceive	Apr 18, 2024	Depth EstimationMultiple-choice	—Unverified
ViLLM-Eval: A Comprehensive Evaluation Suite for Vietnamese Large Language Models	Apr 17, 2024	Language ModelingLanguage Modelling	—Unverified
Question Difficulty Ranking for Multiple-Choice Reading Comprehension	Apr 16, 2024	Multiple-choiceReading Comprehension	—Unverified
Look at the Text: Instruction-Tuned Language Models are More Robust Multiple Choice Selectors than You Think	Apr 12, 2024	Multiple-choice	CodeCode Available
Automatic Generation and Evaluation of Reading Comprehension Test Items with Large Language Models	Apr 11, 2024	Multiple-choiceReading Comprehension	CodeCode Available
MoReVQA: Exploring Modular Reasoning Models for Video Question Answering	Apr 9, 2024	EgoSchemaMultiple-choice	—Unverified
MLaKE: Multilingual Knowledge Editing Benchmark for Large Language Models	Apr 7, 2024	Benchmarkingknowledge editing	CodeCode Available
Cleared for Takeoff? Compositional & Conditional Reasoning may be the Achilles Heel to (Flight-Booking) Language Agents	Apr 5, 2024	Multiple-choiceNavigate	—Unverified
NLP at UC Santa Cruz at SemEval-2024 Task 5: Legal Answer Validation using Few-Shot Multi-Choice QA	Apr 4, 2024	Multiple-choice	CodeCode Available
CSEPrompts: A Benchmark of Introductory Computer Science Prompts	Apr 3, 2024	Multiple-choice	CodeCode Available
Exploring Automated Distractor Generation for Math Multiple-choice Questions via Large Language Models	Apr 2, 2024	Distractor GenerationIn-Context Learning	CodeCode Available
AILS-NTUA at SemEval-2024 Task 9: Cracking Brain Teasers: Transformer Models for Lateral Thinking Puzzles	Apr 1, 2024	Common Sense ReasoningMultiple-choice	CodeCode Available
Can multiple-choice questions really be useful in detecting the abilities of LLMs?	Mar 26, 2024	Multiple-choiceQuestion Answering	CodeCode Available
Pragmatic Competence Evaluation of Large Language Models for the Korean Language	Mar 19, 2024	Few-Shot LearningMultiple-choice	CodeCode Available
LHMKE: A Large-scale Holistic Multi-subject Knowledge Evaluation Benchmark for Chinese Large Language Models	Mar 19, 2024	Multiple-choice	—Unverified
Enhancing Event Causality Identification with Rationale and Structure-Aware Causal Question Answering	Mar 17, 2024	Event Causality IdentificationMultiple-choice	—Unverified
Few-Shot Image Classification and Segmentation as Visual Question Answering Using Vision-Language Models	Mar 15, 2024	Few-Shot Image Classificationimage-classification	—Unverified
EXAMS-V: A Multi-Discipline Multilingual Multimodal Exam Benchmark for Evaluating Vision Language Models	Mar 15, 2024	MiscellaneousMultiple-choice	CodeCode Available
Towards Diverse Perspective Learning with Selection over Multiple Temporal Poolings	Mar 14, 2024	Multiple-choiceTime Series	CodeCode Available
Exploring the Comprehension of ChatGPT in Traditional Chinese Medicine Knowledge	Mar 14, 2024	Multiple-choice	—Unverified
AraTrust: An Evaluation of Trustworthiness for LLMs in Arabic	Mar 14, 2024	EthicsMultiple-choice	—Unverified
Rethinking Generative Large Language Model Evaluation for Semantic Comprehension	Mar 12, 2024	Language Model EvaluationLanguage Modeling	—Unverified
MedKP: Medical Dialogue with Knowledge Enhancement and Clinical Pathway Encoding	Mar 11, 2024	Dialogue GenerationMultiple-choice	—Unverified
Automated Generation of Multiple-Choice Cloze Questions for Assessing English Vocabulary Using GPT-turbo 3.5	Mar 4, 2024	Multiple-choicePart-Of-Speech Tagging	—Unverified
An Improved Traditional Chinese Evaluation Suite for Foundation Model	Mar 4, 2024	Multiple-choiceQuestion Answering	—Unverified
KorMedMCQA: Multi-Choice Question Answering Benchmark for Korean Healthcare Professional Licensing Examinations	Mar 3, 2024	MedQAMMLU	—Unverified
Controlling Cloze-test Question Item Difficulty with PLM-based Surrogate Models for IRT Assessment	Mar 3, 2024	Cloze TestMultiple-choice	—Unverified
Predictions from language models for multiple-choice tasks are not robust under variation of scoring methods	Mar 1, 2024	Multiple-choice	—Unverified
Unsupervised multiple choices question answering via universal corpus	Feb 27, 2024	FormKnowledge Graphs	—Unverified
Biomedical Entity Linking as Multiple Choice Question Answering	Feb 23, 2024	Entity LinkingMultiple-choice	CodeCode Available
"My Answer is C": First-Token Probabilities Do Not Match Text Answers in Instruction-Tuned Language Models	Feb 22, 2024	Multiple-choiceText Generation	CodeCode Available
Identifying Multiple Personalities in Large Language Models with External Evaluation	Feb 22, 2024	Multiple-choice	—Unverified
Beyond Probabilities: Unveiling the Misalignment in Evaluating Large Language Models	Feb 21, 2024	Multiple-choice	—Unverified
Ranking Large Language Models without Ground Truth	Feb 21, 2024	Multiple-choiceTriplet	—Unverified
KorNAT: LLM Alignment Benchmark for Korean Social Values and Common Knowledge	Feb 21, 2024	4kMultiple-choice	—Unverified
Probabilities of Chat LLMs Are Miscalibrated but Still Predict Correctness on Multiple-Choice Q&A	Feb 20, 2024	Language ModellingLarge Language Model	CodeCode Available
Digital Comprehensibility Assessment of Simplified Texts among Persons with Intellectual Disabilities	Feb 20, 2024	Multiple-choiceText Simplification	—Unverified
Artifacts or Abduction: How Do LLMs Answer Multiple-Choice Questions Without the Question?	Feb 19, 2024	Decision MakingMemorization	CodeCode Available
Stick to your Role! Stability of Personal Values Expressed in Large Language Models	Feb 19, 2024	Multiple-choice	—Unverified

Show:10 25 50

← PrevPage 14 of 23Next →

No leaderboard results yet.