Multiple-choice

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 501–550 of 1107 papers

Title	Date	Tasks	Status	Hype
Do Large Language Models Understand Conversational Implicature -- A case study with a chinese sitcom	Apr 30, 2024	ImplicaturesMultiple-choice	CodeCode Available	1
FoundaBench: Evaluating Chinese Fundamental Knowledge Capabilities of Large Language Models	Apr 29, 2024	Common Sense ReasoningMultiple-choice	—Unverified	0
PLAYER*: Enhancing LLM-based Multi-Agent Communication and Interaction in Murder Mystery Games	Apr 26, 2024	Decision MakingLanguage Modeling	CodeCode Available	2
From Multiple-Choice to Extractive QA: A Case Study for English and Arabic	Apr 26, 2024	BelebeleExtractive Question-Answering	CodeCode Available	0
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites	Apr 25, 2024	4kLanguage Modeling	—Unverified	0
SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension	Apr 25, 2024	BenchmarkingMultiple-choice	CodeCode Available	3
AI and Machine Learning for Next Generation Science Assessments	Apr 23, 2024	Multiple-choice	—Unverified	0
TAXI: Evaluating Categorical Knowledge Editing for Language Models	Apr 23, 2024	knowledge editingMultiple-choice	CodeCode Available	0
UnibucLLM: Harnessing LLMs for Automated Prediction of Item Difficulty and Response Time for Multiple-Choice Questions	Apr 20, 2024	Data AugmentationMultiple-choice	CodeCode Available	0
Improving Automated Distractor Generation for Math Multiple-choice Questions with Overgenerate-and-rank	Apr 19, 2024	Distractor GenerationMath	—Unverified	0
Is There No Such Thing as a Bad Question? H4R: HalluciBot For Ratiocination, Rewriting, Ranking, and Routing	Apr 18, 2024	HallucinationMultiple-choice	—Unverified	0
BLINK: Multimodal Large Language Models Can See but Not Perceive	Apr 18, 2024	Depth EstimationMultiple-choice	—Unverified	0
ViLLM-Eval: A Comprehensive Evaluation Suite for Vietnamese Large Language Models	Apr 17, 2024	Language ModelingLanguage Modelling	—Unverified	0
Question Difficulty Ranking for Multiple-Choice Reading Comprehension	Apr 16, 2024	Multiple-choiceReading Comprehension	—Unverified	0
Look at the Text: Instruction-Tuned Language Models are More Robust Multiple Choice Selectors than You Think	Apr 12, 2024	Multiple-choice	CodeCode Available	0
Automatic Generation and Evaluation of Reading Comprehension Test Items with Large Language Models	Apr 11, 2024	Multiple-choiceReading Comprehension	CodeCode Available	0
MoReVQA: Exploring Modular Reasoning Models for Video Question Answering	Apr 9, 2024	EgoSchemaMultiple-choice	—Unverified	0
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding	Apr 8, 2024	GPUMultiple-choice	CodeCode Available	3
MLaKE: Multilingual Knowledge Editing Benchmark for Large Language Models	Apr 7, 2024	Benchmarkingknowledge editing	CodeCode Available	0
Cleared for Takeoff? Compositional & Conditional Reasoning may be the Achilles Heel to (Flight-Booking) Language Agents	Apr 5, 2024	Multiple-choiceNavigate	—Unverified	0
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens	Apr 4, 2024	Language ModelingLanguage Modelling	CodeCode Available	4
NLP at UC Santa Cruz at SemEval-2024 Task 5: Legal Answer Validation using Few-Shot Multi-Choice QA	Apr 4, 2024	Multiple-choice	CodeCode Available	0
CSEPrompts: A Benchmark of Introductory Computer Science Prompts	Apr 3, 2024	Multiple-choice	CodeCode Available	0
Exploring Automated Distractor Generation for Math Multiple-choice Questions via Large Language Models	Apr 2, 2024	Distractor GenerationIn-Context Learning	CodeCode Available	0
AILS-NTUA at SemEval-2024 Task 9: Cracking Brain Teasers: Transformer Models for Lateral Thinking Puzzles	Apr 1, 2024	Common Sense ReasoningMultiple-choice	CodeCode Available	0
Latxa: An Open Language Model and Evaluation Suite for Basque	Mar 29, 2024	Language ModelingLanguage Modelling	CodeCode Available	1
Non-Linear Inference Time Intervention: Improving LLM Truthfulness	Mar 27, 2024	Large Language ModelMultiple-choice	CodeCode Available	1
BioMedLM: A 2.7B Parameter Language Model Trained On Biomedical Text	Mar 27, 2024	ArticlesLanguage Modeling	CodeCode Available	4
An Image Grid Can Be Worth a Video: Zero-shot Video Question Answering Using a VLM	Mar 27, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
Can multiple-choice questions really be useful in detecting the abilities of LLMs?	Mar 26, 2024	Multiple-choiceQuestion Answering	CodeCode Available	0
PCToolkit: A Unified Plug-and-Play Prompt Compression Toolkit of Large Language Models	Mar 26, 2024	Code CompletionFew-Shot Learning	CodeCode Available	3
Understanding Long Videos with Multimodal Language Models	Mar 25, 2024	Action RecognitionFine-grained Action Recognition	CodeCode Available	2
IllusionVQA: A Challenging Optical Illusion Dataset for Vision Language Models	Mar 23, 2024	Common Sense ReasoningIn-Context Learning	CodeCode Available	1
Pragmatic Competence Evaluation of Large Language Models for the Korean Language	Mar 19, 2024	Few-Shot LearningMultiple-choice	CodeCode Available	0
LHMKE: A Large-scale Holistic Multi-subject Knowledge Evaluation Benchmark for Chinese Large Language Models	Mar 19, 2024	Multiple-choice	—Unverified	0
Enhancing Event Causality Identification with Rationale and Structure-Aware Causal Question Answering	Mar 17, 2024	Event Causality IdentificationMultiple-choice	—Unverified	0
Few-Shot Image Classification and Segmentation as Visual Question Answering Using Vision-Language Models	Mar 15, 2024	Few-Shot Image Classificationimage-classification	—Unverified	0
EXAMS-V: A Multi-Discipline Multilingual Multimodal Exam Benchmark for Evaluating Vision Language Models	Mar 15, 2024	MiscellaneousMultiple-choice	CodeCode Available	0
Towards Diverse Perspective Learning with Selection over Multiple Temporal Poolings	Mar 14, 2024	Multiple-choiceTime Series	CodeCode Available	0
AraTrust: An Evaluation of Trustworthiness for LLMs in Arabic	Mar 14, 2024	EthicsMultiple-choice	—Unverified	0
Exploring the Comprehension of ChatGPT in Traditional Chinese Medicine Knowledge	Mar 14, 2024	Multiple-choice	—Unverified	0
Rethinking Generative Large Language Model Evaluation for Semantic Comprehension	Mar 12, 2024	Language Model EvaluationLanguage Modeling	—Unverified	0
Complex Reasoning over Logical Queries on Commonsense Knowledge Graphs	Mar 12, 2024	Knowledge GraphsMultiple-choice	CodeCode Available	1
MedKP: Medical Dialogue with Knowledge Enhancement and Clinical Pathway Encoding	Mar 11, 2024	Dialogue GenerationMultiple-choice	—Unverified	0
Unfamiliar Finetuning Examples Control How Language Models Hallucinate	Mar 8, 2024	MMLUMultiple-choice	CodeCode Available	1
The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning	Mar 5, 2024	Multiple-choice	CodeCode Available	4
An Improved Traditional Chinese Evaluation Suite for Foundation Model	Mar 4, 2024	Multiple-choiceQuestion Answering	—Unverified	0
Automated Generation of Multiple-Choice Cloze Questions for Assessing English Vocabulary Using GPT-turbo 3.5	Mar 4, 2024	Multiple-choicePart-Of-Speech Tagging	—Unverified	0
To Generate or to Retrieve? On the Effectiveness of Artificial Contexts for Medical Open-Domain Question Answering	Mar 4, 2024	MedQAMMLU	CodeCode Available	1
KorMedMCQA: Multi-Choice Question Answering Benchmark for Korean Healthcare Professional Licensing Examinations	Mar 3, 2024	MedQAMMLU	—Unverified	0

Show:10 25 50

← PrevPage 11 of 23Next →

No leaderboard results yet.