SOTAVerified

Multiple-choice

Papers

Showing 110 of 1107 papers

TitleStatusHype
The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations0
HATS: Hindi Analogy Test Set for Evaluating Reasoning in Large Language Models0
MateInfoUB: A Real-World Benchmark for Testing LLMs in Competitive, Multilingual, and Multimodal Educational Tasks0
Advanced Financial Reasoning at Scale: A Comprehensive Evaluation of Large Language Models on CFA Level III0
OmniEval: A Benchmark for Evaluating Omni-modal Models with Visual, Auditory, and Textual Inputs0
Adapting Vision-Language Models for Evaluating World Models0
PhysUniBench: An Undergraduate-Level Physics Reasoning Benchmark for Multimodal Models0
How Far Can Off-the-Shelf Multimodal Large Language Models Go in Online Episodic Memory Question Answering?0
WikiMixQA: A Multimodal Benchmark for Question Answering over Tables and Charts0
Hypothesis Testing for Quantifying LLM-Human Misalignment in Multiple Choice Settings0
Show:102550
← PrevPage 1 of 111Next →

No leaderboard results yet.