Multiple-choice

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1026–1050 of 1107 papers

Title	Date	Tasks	Status
Good, Better, Best: Textual Distractors Generation for Multiple-Choice Visual Question Answering via Reinforcement Learning	Oct 21, 2019	Data AugmentationDecision Making	—Unverified
Evaluating Clinical Competencies of Large Language Models with a General Practice Benchmark	Mar 22, 2025	Multiple-choice	—Unverified
Analyzing Zero-Shot Abilities of Vision-Language Models on Video Understanding Tasks	Oct 7, 2023	Action RecognitionMultiple-choice	—Unverified
GPT-4o System Card	Oct 25, 2024	Multiple-choiceSpatial Reasoning	—Unverified
GPT-4 to GPT-3.5: 'Hold My Scalpel' -- A Look at the Competency of OpenAI's GPT on the Plastic Surgery In-Service Training Exam	Apr 4, 2023	Multiple-choice	—Unverified
Transliteration: A Simple Technique For Improving Multilingual Language Modeling	Sep 29, 2021	Language ModelingLanguage Modelling	—Unverified
True Detective: A Deep Abductive Reasoning Benchmark Undoable for GPT-3 and Challenging for GPT-4	Dec 20, 2022	Multiple-choice	—Unverified
GRAF: Graph Retrieval Augmented by Facts for Romanian Legal Multi-Choice Question Answering	Dec 5, 2024	Information RetrievalMultiple-choice	—Unverified
GraphITE: Estimating Individual Effects of Graph-structured Treatments	Sep 29, 2020	counterfactualDecision Making	—Unverified
Graph-Structured Representations for Visual Question Answering	Sep 19, 2016	Multiple-choiceQuestion Answering	—Unverified
Is There No Such Thing as a Bad Question? H4R: HalluciBot For Ratiocination, Rewriting, Ranking, and Routing	Apr 18, 2024	HallucinationMultiple-choice	—Unverified
Hanfu-Bench: A Multimodal Benchmark on Cross-Temporal Cultural Understanding and Transcreation	Jun 2, 2025	Multiple-choiceQuestion Answering	—Unverified
HANS, are you clever? Clever Hans Effect Analysis of Neural Systems	Sep 21, 2023	Decision MakingMultiple-choice	—Unverified
HardML: A Benchmark For Evaluating Data Science And Machine Learning knowledge and reasoning in AI	Jan 26, 2025	MMLUMultiple-choice	—Unverified
HashEvict: A Pre-Attention KV Cache Eviction Strategy using Locality-Sensitive Hashing	Dec 13, 2024	GPUMultiple-choice	—Unverified
HATS: Hindi Analogy Test Set for Evaluating Reasoning in Large Language Models	Jul 17, 2025	Multiple-choice	—Unverified
Have Large Language Models Developed a Personality?: Applicability of Self-Assessment Tests in Measuring Personality in LLMs	May 24, 2023	Multiple-choice	—Unverified
Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?	Jun 6, 2024	Multiple-choiceQuestion Answering	—Unverified
Analyzing the Performance of ChatGPT in Cardiology and Vascular Pathologies	Apr 15, 2023	Language ModelingLanguage Modelling	—Unverified
Healthy LLMs? Benchmarking LLM Knowledge of UK Government Public Health Information	May 9, 2025	BenchmarkingForm	—Unverified
HFL-RC System at SemEval-2018 Task 11: Hybrid Multi-Aspects Model for Commonsense Reading Comprehension	Mar 15, 2018	Multiple-choiceReading Comprehension	—Unverified
Hierarchical Divide-and-Conquer for Fine-Grained Alignment in LLM-Based Medical Evaluation	Jan 12, 2025	AttributeMultiple-choice	—Unverified
HindiLLM: Large Language Model for Hindi	Dec 29, 2024	Language ModelingLanguage Modelling	—Unverified
Analyzing Multiple-Choice Reading and Listening Comprehension Tests	Jul 3, 2023	Multiple-choiceReading Comprehension	—Unverified
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites	Apr 25, 2024	4kLanguage Modeling	—Unverified

Show:10 25 50

← PrevPage 42 of 45Next →

No leaderboard results yet.