SOTAVerified|Agents Browse Leaderboard About

Multiple-choice

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 451–460 of 1107 papers

Title	Date	Tasks	Status	Hype
Q-Bench-Video: Benchmark the Video Quality Understanding of LMMs	Jan 1, 2025	Multiple-choiceVideo Generation	—Unverified	0
IllusionBench: A Large-scale and Comprehensive Benchmark for Visual Illusion Understanding in Vision-Language Models	Jan 1, 2025	HallucinationMultiple-choice	—Unverified	0
Setting Standards in Turkish NLP: TR-MMLU for Large Language Model Evaluation	Dec 31, 2024	Language Model EvaluationLanguage Modeling	—Unverified	0
A review of faithfulness metrics for hallucination assessment in Large Language Models	Dec 31, 2024	BenchmarkingHallucination	—Unverified	0
AraSTEM: A Native Arabic Multiple Choice Question Benchmark for Evaluating LLMs Knowledge In STEM Subjects	Dec 31, 2024	BenchmarkingMultiple-choice	—Unverified	0
Monty Hall and Optimized Conformal Prediction to Improve Decision-Making with LLMs	Dec 31, 2024	Conformal PredictionDecision Making	—Unverified	0
MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation Models	Dec 31, 2024	Multiple-choiceQuestion Answering	CodeCode Available	0
EQUATOR: A Deterministic Framework for Evaluating LLM Reasoning with Open-Ended Questions. # v1.0.0-beta	Dec 31, 2024	Multiple-choiceQuestion Answering	—Unverified	0
SecBench: A Comprehensive Multi-Dimensional Benchmarking Dataset for LLMs in Cybersecurity	Dec 30, 2024	BenchmarkingCode Generation	—Unverified	0
HindiLLM: Large Language Model for Hindi	Dec 29, 2024	Language ModelingLanguage Modelling	—Unverified	0

Show:10 25 50

← PrevPage 46 of 111Next →

No leaderboard results yet.