SOTAVerified|Agents Browse Leaderboard About

Multiple-choice

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 441–450 of 1107 papers

Title	Date	Tasks	Status	Hype
VELOCITI: Benchmarking Video-Language Compositional Reasoning with Strict Entailment	Jun 16, 2024	Action UnderstandingBenchmarking	—Unverified	0
VCEval: Rethinking What is a Good Educational Video and How to Automatically Evaluate It	Jun 15, 2024	Language ModelingLanguage Modelling	—Unverified	0
CoLoR-Filter: Conditional Loss Reduction Filtering for Targeted Language Model Pre-training	Jun 15, 2024	Domain AdaptationLanguage Modeling	CodeCode Available	1
CHiSafetyBench: A Chinese Hierarchical Safety Benchmark for Large Language Models	Jun 14, 2024	Multiple-choiceQuestion Answering	CodeCode Available	2
Evaluating ChatGPT-4 Vision on Brazil's National Undergraduate Computer Science Exam	Jun 14, 2024	FairnessLogical Reasoning	CodeCode Available	0
BLEnD: A Benchmark for LLMs on Everyday Knowledge in Diverse Cultures and Languages	Jun 14, 2024	Multiple-choice	CodeCode Available	1
IntentionQA: A Benchmark for Evaluating Purchase Intention Comprehension Abilities of Language Models in E-commerce	Jun 14, 2024	Multiple-choiceQuestion Answering	CodeCode Available	1
Bayesian Statistical Modeling with Predictors from LLMs	Jun 13, 2024	Multiple-choice	—Unverified	0
AlignMMBench: Evaluating Chinese Multimodal Alignment in Large Vision-Language Models	Jun 13, 2024	Multiple-choice	—Unverified	0
INS-MMBench: A Comprehensive Benchmark for Evaluating LVLMs' Performance in Insurance	Jun 13, 2024	Multiple-choiceVisual Reasoning	CodeCode Available	1

Show:10 25 50

← PrevPage 45 of 111Next →

No leaderboard results yet.