SOTAVerified

Multiple-choice

Papers

Showing 376400 of 1107 papers

TitleStatusHype
Towards Evaluating and Building Versatile Large Language Models for MedicineCode2
Large Language Models Are Self-Taught Reasoners: Enhancing LLM Applications via Tailored Problem-Solving Demonstrations0
Differentiating Choices via Commonality for Multiple-Choice Question AnsweringCode0
How Susceptible are LLMs to Influence in Prompts?0
Measuring Agreeableness Bias in Multimodal ModelsCode0
Chain-of-Exemplar: Enhancing Distractor Generation for Multimodal Educational Question GenerationCode0
LLMs Are Biased Towards Output Formats! Systematically Evaluating and Mitigating Output Format Bias of LLMsCode1
Examining the Behavior of LLM Architectures Within the Framework of Standardized National Exams in Brazil0
LLaVA-OneVision: Easy Visual Task TransferCode0
Winning Amazon KDD Cup'240
XMainframe: A Large Language Model for Mainframe ModernizationCode2
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language ModelsCode2
Recent Advances in Multi-Choice Machine Reading Comprehension: A Survey on Methods and Datasets0
MiniCPM-V: A GPT-4V Level MLLM on Your PhoneCode12
MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language ModelsCode3
Improved Few-Shot Image Classification Through Multiple-Choice Questions0
Do LLMs Know When to NOT Answer? Investigating Abstention Abilities of Large Language Models0
MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive DiversityCode2
Annealed Multiple Choice Learning: Overcoming limitations of Winner-takes-all with annealingCode1
LongVideoBench: A Benchmark for Long-context Interleaved Video-Language UnderstandingCode2
Answer, Assemble, Ace: Understanding How Transformers Answer Multiple Choice Questions0
MIBench: Evaluating Multimodal Large Language Models over Multiple Images0
Modular Sentence Encoders: Separating Language Specialization from Cross-Lingual AlignmentCode0
Generalization v.s. Memorization: Tracing Language Models' Capabilities Back to Pretraining Data0
Evaluating language models as risk scoresCode1
Show:102550
← PrevPage 16 of 45Next →

No leaderboard results yet.