Multiple-choice

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 801–850 of 1107 papers

Title	Date	Tasks	Status
A Weak Supervision Approach for Predicting Difficulty of Technical Interview Questions	Oct 1, 2022	Multiple-choicePrediction	—Unverified
Bayesian Statistical Modeling with Predictors from LLMs	Jun 13, 2024	Multiple-choice	—Unverified
Being Negative but Constructively: Lessons Learnt from Creating Better Visual Question Answering Datasets	Apr 24, 2017	Multiple-choiceQuestion Answering	—Unverified
Benchmarking Bias in Large Language Models during Role-Playing	Nov 1, 2024	BenchmarkingFairness	—Unverified
The Future of Learning in the Age of Generative AI: Automated Question Generation and Assessment with Large Language Models	Oct 12, 2024	MisconceptionsMultiple-choice	—Unverified
Answer, Assemble, Ace: Understanding How Transformers Answer Multiple Choice Questions	Jul 21, 2024	Multiple-choiceMultiple Choice Question Answering (MCQA)	—Unverified
The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations	Jul 17, 2025	Language ModelingLanguage Modelling	—Unverified
Benchmarking Next-Generation Reasoning-Focused Large Language Models in Ophthalmology: A Head-to-Head Evaluation on 5,888 Items	Apr 15, 2025	BenchmarkingMultiple-choice	—Unverified
Benchmarks for Pirá 2.0, a Reading Comprehension Dataset about the Ocean, the Brazilian Coast, and Climate Change	Sep 19, 2023	Generative Question AnsweringInformation Retrieval	—Unverified
Better Distractions: Transformer-based Distractor Generation and Multiple Choice Question Filtering	Oct 19, 2020	Distractor GenerationLanguage Modeling	—Unverified
Beyond Multiple-Choice Accuracy: Real-World Challenges of Implementing Large Language Models in Healthcare	Oct 24, 2024	Multiple-choice	—Unverified
Beyond Multiple Choice: Evaluating Steering Vectors for Adaptive Free-Form Summarization	May 30, 2025	FormLanguage Modeling	—Unverified
Beyond Probabilities: Unveiling the Misalignment in Evaluating Large Language Models	Feb 21, 2024	Multiple-choice	—Unverified
Beyond Profile: From Surface-Level Facts to Deep Persona Simulation in LLMs	Feb 18, 2025	Generative Question AnsweringMultiple-choice	—Unverified
Not All Options Are Created Equal: Textual Option Weighting for Token-Efficient LLM-Based Knowledge Tracing	Oct 14, 2024	AllBinary Classification	—Unverified
The impact of AI and peer feedback on research writing skills: a study using the CGScholar platform among Kazakhstani scholars	Mar 5, 2025	Multiple-choiceSurvey	—Unverified
LLMs May Perform MCQA by Selecting the Least Incorrect Option	Feb 2, 2024	Multiple-choiceMultiple Choice Question Answering (MCQA)	—Unverified
Beyond VQA: Generating Multi-word Answer and Rationale to Visual Questions	Oct 24, 2020	General ClassificationMultiple-choice	—Unverified
ANPMI: Assessing the True Comprehension Capabilities of LLMs for Multiple Choice Questions	Feb 26, 2025	Language ModelingLanguage Modelling	—Unverified
Bilingual Evaluation of Language Models on General Knowledge in University Entrance Exams with Minimal Contamination	Sep 19, 2024	General KnowledgeMMLU	—Unverified
The Impact of Item-Writing Flaws on Difficulty and Discrimination in Item Response Theory	Mar 13, 2025	MathMultiple-choice	—Unverified
A Novel Approach for Constrained Optimization in Graphical Models	Dec 1, 2020	Multiple-choice	—Unverified
BiRdQA: A Bilingual Dataset for Question Answering on Tricky Riddles	Sep 23, 2021	Multiple-choiceQuestion Answering	—Unverified
The Lazy Student's Dream: ChatGPT Passing an Engineering Course on Its Own	Feb 23, 2025	Multiple-choice	—Unverified
BLINK: Multimodal Large Language Models Can See but Not Perceive	Apr 18, 2024	Depth EstimationMultiple-choice	—Unverified
An MRC Framework for Semantic Role Labeling	Jan 16, 2022	Computational EfficiencyMachine Reading Comprehension	—Unverified
BloomVQA: Assessing Hierarchical Multi-modal Comprehension	Dec 20, 2023	Data AugmentationMemorization	—Unverified
The Order Effect: Investigating Prompt Sensitivity to Input Order in LLMs	Feb 6, 2025	Multiple-choiceSensitivity	—Unverified
The Role of Large Language Models in Musicology: Are We Ready to Trust the Machines?	Sep 3, 2024	Multiple-choiceQuestion Generation	—Unverified
Break the Checkbox: Challenging Closed-Style Evaluations of Cultural Alignment in LLMs	Feb 12, 2025	Multiple-choiceSurvey	—Unverified
The Use of Artificial Intelligence Tools in Assessing Content Validity: A Comparative Study with Human Experts	Feb 3, 2025	Multiple-choiceReading Comprehension	—Unverified
Bridging Information-Seeking Human Gaze and Machine Reading Comprehension	Sep 30, 2020	Machine Reading ComprehensionMultiple-choice	—Unverified
Bridging the Language Gap: Knowledge Injected Multilingual Question Answering	Apr 6, 2023	Cross-Lingual TransferExtractive Question-Answering	—Unverified
Analysis of the Cambridge Multiple-Choice Questions Reading Dataset with a Focus on Candidate Response Distribution	Jun 22, 2023	Multiple-choice	—Unverified
Can AI Master Construction Management (CM)? Benchmarking State-of-the-Art Large Language Models on CM Certification Exams	Apr 4, 2025	BenchmarkingManagement	—Unverified
Can ChatGPT pass the Vietnamese National High School Graduation Examination?	Jun 15, 2023	Language ModelingLanguage Modelling	—Unverified
Can Crowdsourcing be used for Effective Annotation of Arabic?	May 1, 2014	Entity ResolutionMultiple-choice	—Unverified
Can Generative Pre-trained Transformers (GPT) Pass Assessments in Higher Education Programming Courses?	Mar 16, 2023	Multiple-choice	—Unverified
The use of large language models to enhance cancer clinical trial educational materials	Dec 2, 2024	MisinformationMultiple-choice	—Unverified
Can Multimodal LLMs do Visual Temporal Understanding and Reasoning? The answer is No!	Jan 18, 2025	Multiple-choiceQuestion Answering	—Unverified
Can We Trust LLMs? Mitigate Overconfidence Bias in LLMs through Knowledge Transfer	May 27, 2024	Multiple-choiceSentiment Analysis	—Unverified
CBT-Bench: Evaluating Large Language Models on Assisting Cognitive Behavior Therapy	Oct 17, 2024	Multiple-choiceResponse Generation	—Unverified
ACQ: A Unified Framework for Automated Programmatic Creativity in Online Advertising	Dec 9, 2024	Multiple-choiceMulti-Task Learning	—Unverified
CFinBench: A Comprehensive Chinese Financial Benchmark for Large Language Models	Jul 2, 2024	Multiple-choice	—Unverified
CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding	Dec 16, 2024	HallucinationMultiple-choice	—Unverified
Changing Answer Order Can Decrease MMLU Accuracy	Jun 27, 2024	MMLUMultiple-choice	—Unverified
Characterizing Large Language Models as Rationalizers of Knowledge-intensive Tasks	Nov 9, 2023	Multiple-choiceWorld Knowledge	—Unverified
What Makes Reading Comprehension Questions Difficult? Investigating Variation in Passage Sources and Question Types	Sep 17, 2021	Logical ReasoningMultiple-choice	—Unverified
Chat-TS: Enhancing Multi-Modal Reasoning Over Time-Series and Natural Language Data	Mar 13, 2025	Large Language ModelMath	—Unverified
An Improved Traditional Chinese Evaluation Suite for Foundation Model	Mar 4, 2024	Multiple-choiceQuestion Answering	—Unverified

Show:10 25 50

← PrevPage 17 of 23Next →

No leaderboard results yet.