Multiple-choice

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 501–525 of 1107 papers

Title	Date	Tasks	Status	Hype
Do Large Language Models Understand Conversational Implicature -- A case study with a chinese sitcom	Apr 30, 2024	ImplicaturesMultiple-choice	CodeCode Available	1
FoundaBench: Evaluating Chinese Fundamental Knowledge Capabilities of Large Language Models	Apr 29, 2024	Common Sense ReasoningMultiple-choice	—Unverified	0
PLAYER*: Enhancing LLM-based Multi-Agent Communication and Interaction in Murder Mystery Games	Apr 26, 2024	Decision MakingLanguage Modeling	CodeCode Available	2
From Multiple-Choice to Extractive QA: A Case Study for English and Arabic	Apr 26, 2024	BelebeleExtractive Question-Answering	CodeCode Available	0
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites	Apr 25, 2024	4kLanguage Modeling	—Unverified	0
SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension	Apr 25, 2024	BenchmarkingMultiple-choice	CodeCode Available	3
AI and Machine Learning for Next Generation Science Assessments	Apr 23, 2024	Multiple-choice	—Unverified	0
TAXI: Evaluating Categorical Knowledge Editing for Language Models	Apr 23, 2024	knowledge editingMultiple-choice	CodeCode Available	0
UnibucLLM: Harnessing LLMs for Automated Prediction of Item Difficulty and Response Time for Multiple-Choice Questions	Apr 20, 2024	Data AugmentationMultiple-choice	CodeCode Available	0
Improving Automated Distractor Generation for Math Multiple-choice Questions with Overgenerate-and-rank	Apr 19, 2024	Distractor GenerationMath	—Unverified	0
Is There No Such Thing as a Bad Question? H4R: HalluciBot For Ratiocination, Rewriting, Ranking, and Routing	Apr 18, 2024	HallucinationMultiple-choice	—Unverified	0
BLINK: Multimodal Large Language Models Can See but Not Perceive	Apr 18, 2024	Depth EstimationMultiple-choice	—Unverified	0
ViLLM-Eval: A Comprehensive Evaluation Suite for Vietnamese Large Language Models	Apr 17, 2024	Language ModelingLanguage Modelling	—Unverified	0
Question Difficulty Ranking for Multiple-Choice Reading Comprehension	Apr 16, 2024	Multiple-choiceReading Comprehension	—Unverified	0
Look at the Text: Instruction-Tuned Language Models are More Robust Multiple Choice Selectors than You Think	Apr 12, 2024	Multiple-choice	CodeCode Available	0
Automatic Generation and Evaluation of Reading Comprehension Test Items with Large Language Models	Apr 11, 2024	Multiple-choiceReading Comprehension	CodeCode Available	0
MoReVQA: Exploring Modular Reasoning Models for Video Question Answering	Apr 9, 2024	EgoSchemaMultiple-choice	—Unverified	0
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding	Apr 8, 2024	GPUMultiple-choice	CodeCode Available	3
MLaKE: Multilingual Knowledge Editing Benchmark for Large Language Models	Apr 7, 2024	Benchmarkingknowledge editing	CodeCode Available	0
Cleared for Takeoff? Compositional & Conditional Reasoning may be the Achilles Heel to (Flight-Booking) Language Agents	Apr 5, 2024	Multiple-choiceNavigate	—Unverified	0
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens	Apr 4, 2024	Language ModelingLanguage Modelling	CodeCode Available	4
NLP at UC Santa Cruz at SemEval-2024 Task 5: Legal Answer Validation using Few-Shot Multi-Choice QA	Apr 4, 2024	Multiple-choice	CodeCode Available	0
CSEPrompts: A Benchmark of Introductory Computer Science Prompts	Apr 3, 2024	Multiple-choice	CodeCode Available	0
Exploring Automated Distractor Generation for Math Multiple-choice Questions via Large Language Models	Apr 2, 2024	Distractor GenerationIn-Context Learning	CodeCode Available	0
AILS-NTUA at SemEval-2024 Task 9: Cracking Brain Teasers: Transformer Models for Lateral Thinking Puzzles	Apr 1, 2024	Common Sense ReasoningMultiple-choice	CodeCode Available	0

Show:10 25 50

← PrevPage 21 of 45Next →

No leaderboard results yet.