Multiple-choice

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 651–700 of 1107 papers

Title	Date	Tasks	Status	Hype
DeSIQ: Towards an Unbiased, Challenging Benchmark for Social Intelligence Understanding	Oct 24, 2023	Language ModelingLanguage Modelling	—Unverified	0
POE: Process of Elimination for Multiple Choice Reasoning	Oct 24, 2023	In-Context LearningLogical Reasoning	CodeCode Available	0
Dataset Bias Mitigation in Multiple-Choice Visual Question Answering and Beyond	Oct 23, 2023	counterfactualMultiple-choice	—Unverified	0
StoryAnalogy: Deriving Story-level Analogies from Large Language Models to Unlock Analogical Understanding	Oct 19, 2023	Multiple-choiceNatural Language Understanding	CodeCode Available	0
Field-testing items using artificial intelligence: Natural language processing with transformers	Oct 18, 2023	Multiple-choice	—Unverified	0
Investigating Uncertainty Calibration of Aligned Language Models under the Multiple-Choice Setting	Oct 18, 2023	Multiple-choice	—Unverified	0
Evaluating the Symbol Binding Ability of Large Language Models for Multiple-Choice Questions in Vietnamese General Education	Oct 18, 2023	Multiple-choiceMultiple Choice Question Answering (MCQA)	—Unverified	0
JMedLoRA:Medical Domain Adaptation on Japanese Large Language Models using Instruction-tuning	Oct 16, 2023	Domain AdaptationMedical Question Answering	CodeCode Available	1
KGQuiz: Evaluating the Generalization of Encoded Knowledge in Large Language Models	Oct 15, 2023	Multiple-choiceTriplet	CodeCode Available	0
Mitigating Bias for Question Answering Models by Tracking Bias Influence	Oct 13, 2023	Multiple-choiceMulti-Task Learning	—Unverified	0
OpsEval: A Comprehensive IT Operations Benchmark Suite for Large Language Models	Oct 11, 2023	HallucinationIn-Context Learning	CodeCode Available	1
BRAINTEASER: Lateral Thinking Puzzles for Large Language Models	Oct 8, 2023	Distractor GenerationLanguage Modelling	CodeCode Available	1
Analyzing Zero-Shot Abilities of Vision-Language Models on Video Understanding Tasks	Oct 7, 2023	Action RecognitionMultiple-choice	—Unverified	0
LLM-Coordination: Evaluating and Analyzing Multi-agent Coordination Abilities in Large Language Models	Oct 5, 2023	Common Sense ReasoningMultiple-choice	CodeCode Available	1
On the Performance of Multimodal Language Models	Oct 4, 2023	BenchmarkingBinary Classification	—Unverified	0
AutoCast++: Enhancing World Event Prediction with Zero-shot Ranking-based Context Retrieval	Oct 3, 2023	ArticlesDecision Making	CodeCode Available	0
Can Large Language Models Provide Security & Privacy Advice? Measuring the Ability of LLMs to Refute Misconceptions	Oct 3, 2023	MisconceptionsMultiple-choice	CodeCode Available	0
Language Models as Knowledge Bases for Visual Word Sense Disambiguation	Oct 3, 2023	Image CaptioningMultiple-choice	CodeCode Available	0
Fusing Models with Complementary Expertise	Oct 2, 2023	Multiple-choicetext-classification	CodeCode Available	0
Fool Your (Vision and) Language Model With Embarrassingly Simple Permutations	Oct 2, 2023	In-Context LearningInstruction Following	CodeCode Available	1
Automating question generation from educational text	Sep 26, 2023	Multiple-choiceQuestion Generation	—Unverified	0
HANS, are you clever? Clever Hans Effect Analysis of Neural Systems	Sep 21, 2023	Decision MakingMultiple-choice	—Unverified	0
Exploring Iterative Enhancement for Improving Learnersourced Multiple-Choice Question Explanations with Large Language Models	Sep 19, 2023	Explanation GenerationLanguage Modelling	CodeCode Available	0
Estimating Contamination via Perplexity: Quantifying Memorisation in Language Model Evaluation	Sep 19, 2023	Language Model EvaluationLanguage Modeling	CodeCode Available	1
Benchmarks for Pirá 2.0, a Reading Comprehension Dataset about the Ocean, the Brazilian Coast, and Climate Change	Sep 19, 2023	Generative Question AnsweringInformation Retrieval	—Unverified	0
Language models are susceptible to incorrect patient self-diagnosis in medical applications	Sep 17, 2023	DiagnosticMultiple-choice	—Unverified	0
Self-Assessment Tests are Unreliable Measures of LLM Personality	Sep 15, 2023	Multiple-choice	—Unverified	0
SafetyBench: Evaluating the Safety of Large Language Models	Sep 13, 2023	Multiple-choice	CodeCode Available	2
Performance of ChatGPT-3.5 and GPT-4 on the United States Medical Licensing Examination With and Without Distractions	Sep 12, 2023	Multiple-choiceSentence	—Unverified	0
Use neural networks to recognize students' handwritten letters and incorrect symbols	Sep 12, 2023	Multiple-choice	—Unverified	0
Large Language Models Are Not Robust Multiple Choice Selectors	Sep 7, 2023	Computational EfficiencyMultiple-choice	CodeCode Available	1
An Automatic Evaluation Framework for Multi-turn Medical Consultations Capabilities of Large Language Models	Sep 5, 2023	Multiple-choice	—Unverified	0
CodeApex: A Bilingual Programming Evaluation Benchmark for Large Language Models	Sep 5, 2023	Code GenerationMultiple-choice	CodeCode Available	1
INCEPTNET: Precise And Early Disease Detection Application For Medical Images Analyses	Sep 5, 2023	Cell DetectionLesion Segmentation	CodeCode Available	0
Generalised Winograd Schema and its Contextuality	Aug 31, 2023	coreference-resolutionCoreference Resolution	—Unverified	0
The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants	Aug 31, 2023	BelebeleCross-Lingual Transfer	CodeCode Available	2
Spoken Language Intelligence of Large Language Models for Language Learning	Aug 28, 2023	Language AcquisitionMultiple-choice	CodeCode Available	0
Large Language Models Sensitivity to The Order of Options in Multiple-Choice Questions	Aug 22, 2023	Multiple-choiceSensitivity	—Unverified	0
LibriSQA: A Novel Dataset and Framework for Spoken Question Answering with Large Language Models	Aug 20, 2023	Multiple-choiceQuestion Answering	CodeCode Available	1
FinEval: A Chinese Financial Domain Knowledge Evaluation Benchmark for Large Language Models	Aug 19, 2023	Multiple-choice	CodeCode Available	2
Open-vocabulary Video Question Answering: A New Benchmark for Evaluating the Generalizability of Video Question Answering Models	Aug 18, 2023	Multiple-choiceQuestion Answering	CodeCode Available	1
EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language Understanding	Aug 17, 2023	DiagnosticEgoSchema	CodeCode Available	1
A Comparative Study of Open-Source Large Language Models, GPT-4 and Claude 2: Multiple-Choice Test Taking in Nephrology	Aug 9, 2023	Multiple-choice	—Unverified	0
Automated Distractor and Feedback Generation for Math Multiple-choice Questions via In-context Learning	Aug 7, 2023	In-Context LearningMath	CodeCode Available	0
ChatGPT for GTFS: Benchmarking LLMs on GTFS Understanding and Retrieval	Aug 4, 2023	BenchmarkingInformation Retrieval	CodeCode Available	0
ReCoMIF: Reading comprehension based multi-source information fusion network for Chinese spoken language understanding	Aug 1, 2023	Intent DetectionMultiple-choice	CodeCode Available	0
MovieChat: From Dense Token to Sparse Memory for Long Video Understanding	Jul 31, 2023	Multiple-choiceQuestion Answering	CodeCode Available	2
Distractor generation for multiple-choice questions with predictive prompting and large language models	Jul 30, 2023	Distractor GenerationMultiple-choice	CodeCode Available	0
SEED-Bench: Benchmarking Multimodal LLMs with Generative Comprehension	Jul 30, 2023	BenchmarkingMultiple-choice	CodeCode Available	2
A large language model-assisted education tool to provide feedback on open-ended responses	Jul 25, 2023	Language ModelingLanguage Modelling	CodeCode Available	0

Show:10 25 50

← PrevPage 14 of 23Next →

No leaderboard results yet.