SOTAVerified|Agents Browse Leaderboard About

Multiple-choice

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 411–420 of 1107 papers

Title	Date	Tasks	Status	Hype
Self-Recognition in Language Models	Jul 9, 2024	Multiple-choice	CodeCode Available	0
ORAN-Bench-13K: An Open Source Benchmark for Assessing LLMs in Open Radio Access Networks	Jul 8, 2024	Anomaly DetectionCode Generation	CodeCode Available	1
Can Model Uncertainty Function as a Proxy for Multiple-Choice Question Item Difficulty?	Jul 7, 2024	Multiple-choice	CodeCode Available	0
MMSci: A Dataset for Graduate-Level Multi-Discipline Multimodal Scientific Understanding	Jul 6, 2024	ArticlesInstruction Following	CodeCode Available	2
LogicVista: Multimodal LLM Logical Reasoning Benchmark in Visual Contexts	Jul 6, 2024	Logical ReasoningMathematical Reasoning	CodeCode Available	1
Are Large Language Models Consistent over Value-laden Questions?	Jul 3, 2024	Multiple-choice	CodeCode Available	0
Is Your Large Language Model Knowledgeable or a Choices-Only Cheater?	Jul 2, 2024	Graph MiningLanguage Modeling	CodeCode Available	0
CFinBench: A Comprehensive Chinese Financial Benchmark for Large Language Models	Jul 2, 2024	Multiple-choice	—Unverified	0
MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation	Jun 29, 2024	Multiple-choice	CodeCode Available	1
InfiniBench: A Comprehensive Benchmark for Large Multimodal Models in Very Long Video Understanding	Jun 28, 2024	Multiple-choiceVideo Understanding	CodeCode Available	1

Show:10 25 50

← PrevPage 42 of 111Next →

No leaderboard results yet.