SOTAVerified|Agents Browse Leaderboard About

Multiple-choice

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 631–640 of 1107 papers

Title	Date	Tasks	Status	Hype
CLOMO: Counterfactual Logical Modification with Large Language Models	Nov 29, 2023	counterfactualCounterfactual Reasoning	CodeCode Available	0
SEED-Bench-2: Benchmarking Multimodal Large Language Models	Nov 28, 2023	BenchmarkingImage Generation	CodeCode Available	2
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark	Nov 28, 2023	3D Question Answering (3D-QA)Diagnostic	CodeCode Available	2
GPQA: A Graduate-Level Google-Proof Q&A Benchmark	Nov 20, 2023	Multiple-choice	CodeCode Available	2
Downstream Trade-offs of a Family of Text Watermarks	Nov 16, 2023	FormLanguage Modelling	CodeCode Available	0
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection	Nov 16, 2023	Language ModelingLanguage Modelling	CodeCode Available	4
ConceptPsy:A Benchmark Suite with Conceptual Comprehensiveness in Psychology	Nov 16, 2023	MMLUMultiple-choice	—Unverified	0
Investigating Data Contamination in Modern Benchmarks for Large Language Models	Nov 16, 2023	Common Sense ReasoningMMLU	—Unverified	0
Evaluating LLMs on Document-Based QA: Exact Answer Selection and Numerical Extraction using Cogtale dataset	Nov 14, 2023	Answer SelectionInformation Retrieval	—Unverified	0
It's Not Easy Being Wrong: Large Language Models Struggle with Process of Elimination Reasoning	Nov 13, 2023	Multiple-choice	CodeCode Available	0

Show:10 25 50

← PrevPage 64 of 111Next →

No leaderboard results yet.