SOTAVerified

Multiple-choice

Papers

Showing 261270 of 1107 papers

TitleStatusHype
Advanced Financial Reasoning at Scale: A Comprehensive Evaluation of Large Language Models on CFA Level III0
OmniEval: A Benchmark for Evaluating Omni-modal Models with Visual, Auditory, and Textual Inputs0
Adapting Vision-Language Models for Evaluating World Models0
PhysUniBench: An Undergraduate-Level Physics Reasoning Benchmark for Multimodal Models0
How Far Can Off-the-Shelf Multimodal Large Language Models Go in Online Episodic Memory Question Answering?0
WikiMixQA: A Multimodal Benchmark for Question Answering over Tables and Charts0
Hypothesis Testing for Quantifying LLM-Human Misalignment in Multiple Choice Settings0
Thunder-NUBench: A Benchmark for LLMs' Sentence-Level Negation Understanding0
Training-free LLM Merging for Multi-task LearningCode0
Instruction Tuning and CoT Prompting for Contextual Medical QA with LLMs0
Show:102550
← PrevPage 27 of 111Next →

No leaderboard results yet.