SOTAVerified

Multiple-choice

Papers

Showing 551560 of 1107 papers

TitleStatusHype
Controlling Cloze-test Question Item Difficulty with PLM-based Surrogate Models for IRT Assessment0
ParallelPARC: A Scalable Pipeline for Generating Natural-Language AnalogiesCode1
Predictions from language models for multiple-choice tasks are not robust under variation of scoring methods0
NewsBench: A Systematic Evaluation Framework for Assessing Editorial Capabilities of Large Language Models in Chinese JournalismCode1
Benchmarking Large Language Models on Answering and Explaining Challenging Medical QuestionsCode1
NextLevelBERT: Masked Language Modeling with Higher-Level Representations for Long DocumentsCode1
Unsupervised multiple choices question answering via universal corpus0
Leveraging Large Language Models for Learning Complex Legal Concepts through StorytellingCode1
Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language ModelsCode1
MoZIP: A Multilingual Benchmark to Evaluate Large Language Models in Intellectual PropertyCode1
Show:102550
← PrevPage 56 of 111Next →

No leaderboard results yet.