SOTAVerified

Multiple-choice

Papers

Showing 631640 of 1107 papers

TitleStatusHype
CLOMO: Counterfactual Logical Modification with Large Language ModelsCode0
SEED-Bench-2: Benchmarking Multimodal Large Language ModelsCode2
MVBench: A Comprehensive Multi-modal Video Understanding BenchmarkCode2
GPQA: A Graduate-Level Google-Proof Q&A BenchmarkCode2
Downstream Trade-offs of a Family of Text WatermarksCode0
Video-LLaVA: Learning United Visual Representation by Alignment Before ProjectionCode4
ConceptPsy:A Benchmark Suite with Conceptual Comprehensiveness in Psychology0
Investigating Data Contamination in Modern Benchmarks for Large Language Models0
Evaluating LLMs on Document-Based QA: Exact Answer Selection and Numerical Extraction using Cogtale dataset0
It's Not Easy Being Wrong: Large Language Models Struggle with Process of Elimination ReasoningCode0
Show:102550
← PrevPage 64 of 111Next →

No leaderboard results yet.