Multiple-choice

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 451–500 of 1107 papers

Title	Date	Tasks	Status
Separation of Powers: On Segregating Knowledge from Observation in LLM-enabled Knowledge-based Visual Question Answering	Jan 1, 2025	Multiple-choiceQuestion Answering	—Unverified
Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation	Jan 1, 2025	Language ModelingLanguage Modelling	—Unverified
Monty Hall and Optimized Conformal Prediction to Improve Decision-Making with LLMs	Dec 31, 2024	Conformal PredictionDecision Making	—Unverified
MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation Models	Dec 31, 2024	Multiple-choiceQuestion Answering	CodeCode Available
A review of faithfulness metrics for hallucination assessment in Large Language Models	Dec 31, 2024	BenchmarkingHallucination	—Unverified
AraSTEM: A Native Arabic Multiple Choice Question Benchmark for Evaluating LLMs Knowledge In STEM Subjects	Dec 31, 2024	BenchmarkingMultiple-choice	—Unverified
EQUATOR: A Deterministic Framework for Evaluating LLM Reasoning with Open-Ended Questions. # v1.0.0-beta	Dec 31, 2024	Multiple-choiceQuestion Answering	—Unverified
Setting Standards in Turkish NLP: TR-MMLU for Large Language Model Evaluation	Dec 31, 2024	Language Model EvaluationLanguage Modeling	—Unverified
SecBench: A Comprehensive Multi-Dimensional Benchmarking Dataset for LLMs in Cybersecurity	Dec 30, 2024	BenchmarkingCode Generation	—Unverified
HindiLLM: Large Language Model for Hindi	Dec 29, 2024	Language ModelingLanguage Modelling	—Unverified
Using Large Language Models for Automated Grading of Student Writing about Science	Dec 25, 2024	AstronomyMultiple-choice	—Unverified
In Case You Missed It: ARC 'Challenge' Is Not That Challenging	Dec 23, 2024	ARCMultiple-choice	—Unverified
Are You Doubtful? Oh, It Might Be Difficult Then! Exploring the Use of Model Uncertainty for Question Difficulty Estimation	Dec 16, 2024	Multiple-choice	—Unverified
CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding	Dec 16, 2024	HallucinationMultiple-choice	—Unverified
Auto-bidding in real-time auctions via Oracle Imitation Learning (OIL)	Dec 16, 2024	Imitation LearningMultiple-choice	—Unverified
Seeing the Forest and the Trees: Solving Visual Graph and Tree Based Data Structure Problems using Large Multimodal Models	Dec 15, 2024	Multiple-choice	—Unverified
Superhuman performance of a large language model on the reasoning tasks of a physician	Dec 14, 2024	DiagnosticLanguage Modeling	—Unverified
MedG-KRP: Medical Graph Knowledge Representation Probing	Dec 14, 2024	Multiple-choiceMultiple Choice Question Answering (MCQA)	CodeCode Available
A recent evaluation on the performance of LLMs on radiation oncology physics using questions of randomly shuffled options	Dec 14, 2024	Multiple-choice	—Unverified
Do LLMs Act as Repositories of Causal Knowledge?	Dec 14, 2024	Causal InferenceMultiple-choice	—Unverified
A multimodal dataset for understanding the impact of mobile phones on remote online virtual education	Dec 13, 2024	EEGHead Pose Estimation	CodeCode Available
HashEvict: A Pre-Attention KV Cache Eviction Strategy using Locality-Sensitive Hashing	Dec 13, 2024	GPUMultiple-choice	—Unverified
Does Multiple Choice Have a Future in the Age of Generative AI? A Posttest-only RCT	Dec 13, 2024	Multiple-choice	CodeCode Available
LLM Distillation for Efficient Few-Shot Multiple Choice Question Answering	Dec 13, 2024	Few-Shot LearningKnowledge Distillation	—Unverified
MM-PoE: Multiple Choice Reasoning via. Process of Elimination using Multi-Modal Models	Dec 10, 2024	Multiple-choiceQuestion Answering	CodeCode Available
ACQ: A Unified Framework for Automated Programmatic Creativity in Online Advertising	Dec 9, 2024	Multiple-choiceMulti-Task Learning	—Unverified
Evaluating and Mitigating Social Bias for Large Language Models in Open-ended Settings	Dec 9, 2024	Multiple-choice	CodeCode Available
Learning to Correction: Explainable Feedback Generation for Visual Commonsense Reasoning Distractor	Dec 8, 2024	MisconceptionsMultiple-choice	CodeCode Available
MANTA: A Large-Scale Multi-View and Visual-Text Anomaly Detection Dataset for Tiny Objects	Dec 6, 2024	2kAnomaly Detection	—Unverified
GRAF: Graph Retrieval Augmented by Facts for Romanian Legal Multi-Choice Question Answering	Dec 5, 2024	Information RetrievalMultiple-choice	—Unverified
Establishing Task Scaling Laws via Compute-Efficient Model Ladders	Dec 5, 2024	Language ModelingLanguage Modelling	—Unverified
The use of large language models to enhance cancer clinical trial educational materials	Dec 2, 2024	MisinformationMultiple-choice	—Unverified
Unlocking Video-LLM via Agent-of-Thoughts Distillation	Dec 2, 2024	Language ModelingLanguage Modelling	—Unverified
Noise Injection Reveals Hidden Capabilities of Sandbagging Language Models	Dec 2, 2024	MMLUMultiple-choice	CodeCode Available
KnowledgePrompts: Exploring the Abilities of Large Language Models to Solve Proportional Analogies via Knowledge-Enhanced Prompting	Dec 1, 2024	Multiple-choiceMultiple Choice Question Answering (MCQA)	CodeCode Available
Uhura: A Benchmark for Evaluating Scientific Question Answering and Truthfulness in Low-Resource African Languages	Dec 1, 2024	ARCMultiple-choice	—Unverified
Cognitive Biases in Large Language Models: A Survey and Mitigation Experiments	Nov 30, 2024	Multiple-choice	—Unverified
Perception Test 2024: Challenge Summary and a Novel Hour-Long VideoQA Benchmark	Nov 29, 2024	BenchmarkingGrounded Video Question Answering	—Unverified
Sparse Attention Vectors: Generative Multimodal Model Features Are Discriminative Vision-Language Classifiers	Nov 28, 2024	Image Captioningimage-classification	—Unverified
Applying IRT to Distinguish Between Human and Generative AI Responses to Multiple-Choice Assessments	Nov 28, 2024	Multiple-choice	—Unverified
Multiple Choice Learning for Efficient Speech Separation with Many Speakers	Nov 27, 2024	Multiple-choiceSpeech Separation	—Unverified
NEMO: Can Multimodal LLMs Identify Attribute-Modified Objects?	Nov 26, 2024	AttributeMultiple-choice	—Unverified
SAGEval: The frontiers of Satisfactory Agent based NLG Evaluation for reference-free open-ended text	Nov 25, 2024	Language ModelingLanguage Modelling	—Unverified
GEMeX: A Large-Scale, Groundable, and Explainable Medical VQA Benchmark for Chest X-ray Diagnosis	Nov 25, 2024	Medical Visual Question AnsweringMultiple-choice	—Unverified
AfriMed-QA: A Pan-African, Multi-Specialty, Medical Question-Answering Benchmark Dataset	Nov 23, 2024	Language ModelingLanguage Modelling	—Unverified
VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation	Nov 20, 2024	ChatbotMultiple-choice	—Unverified
Testing Uncertainty of Large Language Models for Physics Knowledge and Reasoning	Nov 18, 2024	Logical ReasoningMultiple-choice	—Unverified
A Benchmark for Long-Form Medical Question Answering	Nov 14, 2024	Answer GenerationForm	CodeCode Available
DAHL: Domain-specific Automated Hallucination Evaluation of Long-Form Text through a Benchmark Dataset in Biomedicine	Nov 14, 2024	FormHallucination	CodeCode Available
TRACE: Transformer-based Risk Assessment for Clinical Evaluation	Nov 13, 2024	Decision MakingMissing Values	CodeCode Available

Show:10 25 50

← PrevPage 10 of 23Next →

No leaderboard results yet.