SOTAVerified

Multiple-choice

Papers

Showing 171180 of 1107 papers

TitleStatusHype
Boosting Healthcare LLMs Through Retrieved ContextCode1
Is Bigger and Deeper Always Better? Probing LLaMA Across Scales and LayersCode1
INS-MMBench: A Comprehensive Benchmark for Evaluating LVLMs' Performance in InsuranceCode1
IntentionQA: A Benchmark for Evaluating Purchase Intention Comprehension Abilities of Language Models in E-commerceCode1
EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language UnderstandingCode1
Evaluating GPT-3.5 and GPT-4 Models on Brazilian University Admission ExamsCode1
BiMediX: Bilingual Medical Mixture of Experts LLMCode1
Language Model Uncertainty Quantification with Attention ChainCode1
Explicit Planning Helps Language Models in Logical ReasoningCode1
Delving into the Reversal Curse: How Far Can Large Language Models Generalize?Code1
Show:102550
← PrevPage 18 of 111Next →

No leaderboard results yet.