SOTAVerified

Multiple-choice

Papers

Showing 626650 of 1107 papers

TitleStatusHype
A Comparative Study of AI-Generated (GPT-4) and Human-crafted MCQs in Programming Education0
Unleashing the Potential of Large Language Model: Zero-shot VQA for Flood Disaster Scenario0
Explanatory Argument Extraction of Correct Answers in Resident Medical ExamsCode0
Evaluating the Rationale Understanding of Critical Reasoning in Logical Reading Comprehension0
Biomedical knowledge graph-optimized prompt generation for large language modelsCode2
CLOMO: Counterfactual Logical Modification with Large Language ModelsCode0
SEED-Bench-2: Benchmarking Multimodal Large Language ModelsCode2
MVBench: A Comprehensive Multi-modal Video Understanding BenchmarkCode2
GPQA: A Graduate-Level Google-Proof Q&A BenchmarkCode2
Downstream Trade-offs of a Family of Text WatermarksCode0
Video-LLaVA: Learning United Visual Representation by Alignment Before ProjectionCode4
ConceptPsy:A Benchmark Suite with Conceptual Comprehensiveness in Psychology0
Investigating Data Contamination in Modern Benchmarks for Large Language Models0
Evaluating LLMs on Document-Based QA: Exact Answer Selection and Numerical Extraction using Cogtale dataset0
It's Not Easy Being Wrong: Large Language Models Struggle with Process of Elimination ReasoningCode0
Data Contamination Quiz: A Tool to Detect and Estimate Contamination in Large Language ModelsCode1
Fake Alignment: Are LLMs Really Aligned Well?Code1
Characterizing Large Language Models as Rationalizers of Knowledge-intensive Tasks0
Assessing Distractors in Multiple-Choice Tests0
Evaluating multiple large language models in pediatric ophthalmology0
Evaluating the Potential of Leading Large Language Models in Reasoning Biology Questions0
More Robots are Coming: Large Multimodal Models (ChatGPT) can Solve Visually Diverse Images of Parsons Problems0
CASE: Commonsense-Augmented Score with an Expanded Answer SpaceCode0
Resilient Multiple Choice Learning: A learned scoring scheme with application to audio scene analysisCode1
An Open Source Data Contamination Report for Large Language ModelsCode1
Show:102550
← PrevPage 26 of 45Next →

No leaderboard results yet.