SOTAVerified

Multiple-choice

Papers

Showing 451460 of 1107 papers

TitleStatusHype
Q-Bench-Video: Benchmark the Video Quality Understanding of LMMs0
IllusionBench: A Large-scale and Comprehensive Benchmark for Visual Illusion Understanding in Vision-Language Models0
Setting Standards in Turkish NLP: TR-MMLU for Large Language Model Evaluation0
A review of faithfulness metrics for hallucination assessment in Large Language Models0
AraSTEM: A Native Arabic Multiple Choice Question Benchmark for Evaluating LLMs Knowledge In STEM Subjects0
Monty Hall and Optimized Conformal Prediction to Improve Decision-Making with LLMs0
MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation ModelsCode0
EQUATOR: A Deterministic Framework for Evaluating LLM Reasoning with Open-Ended Questions. # v1.0.0-beta0
SecBench: A Comprehensive Multi-Dimensional Benchmarking Dataset for LLMs in Cybersecurity0
HindiLLM: Large Language Model for Hindi0
Show:102550
← PrevPage 46 of 111Next →

No leaderboard results yet.