SOTAVerified

Multiple-choice

Papers

Showing 731740 of 1107 papers

TitleStatusHype
M3KE: A Massive Multi-Level Multi-Subject Knowledge Evaluation Benchmark for Chinese Large Language ModelsCode1
A quantitative study of NLP approaches to question difficulty estimationCode0
C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation ModelsCode3
EMBRACE: Evaluation and Modifications for Boosting RACECode0
Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought PromptingCode1
MindGames: Targeting Theory of Mind in Large Language Models with Dynamic Epistemic Modal LogicCode1
Contextual Response Interpretation for Automated Structured Interviews: A Case Study in Market Research0
Who's the Best Detective? LLMs vs. MLs in Detecting Incoherent Fourth Grade Math Answers0
Analyzing the Performance of ChatGPT in Cardiology and Vascular Pathologies0
Prompt Engineering and Calibration for Zero-Shot Commonsense Reasoning0
Show:102550
← PrevPage 74 of 111Next →

No leaderboard results yet.