Multiple-choice

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 676–700 of 1107 papers

Title	Date	Tasks	Status	Hype
Language models are susceptible to incorrect patient self-diagnosis in medical applications	Sep 17, 2023	DiagnosticMultiple-choice	—Unverified	0
Self-Assessment Tests are Unreliable Measures of LLM Personality	Sep 15, 2023	Multiple-choice	—Unverified	0
SafetyBench: Evaluating the Safety of Large Language Models	Sep 13, 2023	Multiple-choice	CodeCode Available	2
Performance of ChatGPT-3.5 and GPT-4 on the United States Medical Licensing Examination With and Without Distractions	Sep 12, 2023	Multiple-choiceSentence	—Unverified	0
Use neural networks to recognize students' handwritten letters and incorrect symbols	Sep 12, 2023	Multiple-choice	—Unverified	0
Large Language Models Are Not Robust Multiple Choice Selectors	Sep 7, 2023	Computational EfficiencyMultiple-choice	CodeCode Available	1
An Automatic Evaluation Framework for Multi-turn Medical Consultations Capabilities of Large Language Models	Sep 5, 2023	Multiple-choice	—Unverified	0
CodeApex: A Bilingual Programming Evaluation Benchmark for Large Language Models	Sep 5, 2023	Code GenerationMultiple-choice	CodeCode Available	1
INCEPTNET: Precise And Early Disease Detection Application For Medical Images Analyses	Sep 5, 2023	Cell DetectionLesion Segmentation	CodeCode Available	0
Generalised Winograd Schema and its Contextuality	Aug 31, 2023	coreference-resolutionCoreference Resolution	—Unverified	0
The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants	Aug 31, 2023	BelebeleCross-Lingual Transfer	CodeCode Available	2
Spoken Language Intelligence of Large Language Models for Language Learning	Aug 28, 2023	Language AcquisitionMultiple-choice	CodeCode Available	0
Large Language Models Sensitivity to The Order of Options in Multiple-Choice Questions	Aug 22, 2023	Multiple-choiceSensitivity	—Unverified	0
LibriSQA: A Novel Dataset and Framework for Spoken Question Answering with Large Language Models	Aug 20, 2023	Multiple-choiceQuestion Answering	CodeCode Available	1
FinEval: A Chinese Financial Domain Knowledge Evaluation Benchmark for Large Language Models	Aug 19, 2023	Multiple-choice	CodeCode Available	2
Open-vocabulary Video Question Answering: A New Benchmark for Evaluating the Generalizability of Video Question Answering Models	Aug 18, 2023	Multiple-choiceQuestion Answering	CodeCode Available	1
EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language Understanding	Aug 17, 2023	DiagnosticEgoSchema	CodeCode Available	1
A Comparative Study of Open-Source Large Language Models, GPT-4 and Claude 2: Multiple-Choice Test Taking in Nephrology	Aug 9, 2023	Multiple-choice	—Unverified	0
Automated Distractor and Feedback Generation for Math Multiple-choice Questions via In-context Learning	Aug 7, 2023	In-Context LearningMath	CodeCode Available	0
ChatGPT for GTFS: Benchmarking LLMs on GTFS Understanding and Retrieval	Aug 4, 2023	BenchmarkingInformation Retrieval	CodeCode Available	0
ReCoMIF: Reading comprehension based multi-source information fusion network for Chinese spoken language understanding	Aug 1, 2023	Intent DetectionMultiple-choice	CodeCode Available	0
MovieChat: From Dense Token to Sparse Memory for Long Video Understanding	Jul 31, 2023	Multiple-choiceQuestion Answering	CodeCode Available	2
Distractor generation for multiple-choice questions with predictive prompting and large language models	Jul 30, 2023	Distractor GenerationMultiple-choice	CodeCode Available	0
SEED-Bench: Benchmarking Multimodal LLMs with Generative Comprehension	Jul 30, 2023	BenchmarkingMultiple-choice	CodeCode Available	2
A large language model-assisted education tool to provide feedback on open-ended responses	Jul 25, 2023	Language ModelingLanguage Modelling	CodeCode Available	0

Show:10 25 50

← PrevPage 28 of 45Next →

No leaderboard results yet.