SOTAVerified|Agents Browse Leaderboard About

Multiple-choice

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 591–600 of 1107 papers

Title	Date	Tasks	Status	Hype
Exposing the Limits of Video-Text Models through Contrast Sets	Jan 16, 2022	Language ModelingLanguage Modelling	—Unverified	0
Towards Multilingual LLM Evaluation for Baltic and Nordic languages: A study on Lithuanian History	Jan 15, 2025	Multiple-choiceQuestion Answering	—Unverified	0
FactTest: Factuality Testing in Large Language Models with Finite-Sample and Distribution-Free Guarantees	Nov 4, 2024	Multiple-choiceQuestion Answering	—Unverified	0
Towards Multistage Design of Modular Systems	Jun 19, 2013	Multiple-choice	—Unverified	0
FAMULUS: Interactive Annotation and Feedback Generation for Teaching Diagnostic Reasoning	Aug 29, 2019	DiagnosticMultiple-choice	—Unverified	0
FarsEval-PKBETS: A new diverse benchmark for evaluating Persian large language models	Apr 20, 2025	DescriptiveEthics	—Unverified	0
Town Hall Debate Prompting: Enhancing Logical Reasoning in LLMs through Multi-Persona Interaction	Jan 28, 2025	Logical ReasoningMultiple-choice	—Unverified	0
FAVOR-Bench: A Comprehensive Benchmark for Fine-Grained Video Motion Understanding	Mar 19, 2025	BenchmarkingMultiple-choice	—Unverified	0
Few-Shot Image Classification and Segmentation as Visual Question Answering Using Vision-Language Models	Mar 15, 2024	Few-Shot Image Classificationimage-classification	—Unverified	0
Field-testing items using artificial intelligence: Natural language processing with transformers	Oct 18, 2023	Multiple-choice	—Unverified	0

Show:10 25 50

← PrevPage 60 of 111Next →

No leaderboard results yet.