SOTAVerified

Multiple-choice

Papers

Showing 411420 of 1107 papers

TitleStatusHype
Self-Recognition in Language ModelsCode0
ORAN-Bench-13K: An Open Source Benchmark for Assessing LLMs in Open Radio Access NetworksCode1
Can Model Uncertainty Function as a Proxy for Multiple-Choice Question Item Difficulty?Code0
MMSci: A Dataset for Graduate-Level Multi-Discipline Multimodal Scientific UnderstandingCode2
LogicVista: Multimodal LLM Logical Reasoning Benchmark in Visual ContextsCode1
Are Large Language Models Consistent over Value-laden Questions?Code0
Is Your Large Language Model Knowledgeable or a Choices-Only Cheater?Code0
CFinBench: A Comprehensive Chinese Financial Benchmark for Large Language Models0
MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient EvaluationCode1
InfiniBench: A Comprehensive Benchmark for Large Multimodal Models in Very Long Video UnderstandingCode1
Show:102550
← PrevPage 42 of 111Next →

No leaderboard results yet.