SOTAVerified

Multiple-choice

Papers

Showing 611620 of 1107 papers

TitleStatusHype
None of the Others: a General Technique to Distinguish Reasoning from Memorization in Multiple-Choice LLM Evaluation Benchmarks0
No Task Left Behind: Multi-Task Learning of Knowledge Tracing and Option Tracing for Better Student Assessment0
Note on Combinatorial Engineering Frameworks for Hierarchical Modular Systems0
Note on Evolution and Forecasting of Requirements: Communications Example0
Novel-WD: Exploring acquisition of Novel World Knowledge in LLMs Using Prefix-Tuning0
NTSEBENCH: Cognitive Reasoning Benchmark for Vision Language Models0
Objective quantification of mood states using large language models0
OCCULT: Evaluating Large Language Models for Offensive Cyber Operation Capabilities0
OLMES: A Standard for Language Model Evaluations0
OmniEval: A Benchmark for Evaluating Omni-modal Models with Visual, Auditory, and Textual Inputs0
Show:102550
← PrevPage 62 of 111Next →

No leaderboard results yet.