SOTAVerified

Multiple-choice

Papers

Showing 251275 of 1107 papers

TitleStatusHype
Data Contamination Quiz: A Tool to Detect and Estimate Contamination in Large Language ModelsCode1
EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language UnderstandingCode1
Estimating Contamination via Perplexity: Quantifying Memorisation in Language Model EvaluationCode1
Explicit Planning Helps Language Models in Logical ReasoningCode1
InfiniBench: A Comprehensive Benchmark for Large Multimodal Models in Very Long Video UnderstandingCode1
ORAN-Bench-13K: An Open Source Benchmark for Assessing LLMs in Open Radio Access NetworksCode1
CHOICE: Benchmarking the Remote Sensing Capabilities of Large Vision-Language ModelsCode1
Controlling Cloze-test Question Item Difficulty with PLM-based Surrogate Models for IRT Assessment0
Contextual Response Interpretation for Automated Structured Interviews: A Case Study in Market Research0
Analysing the Effect of Masking Length Distribution of MLM: An Evaluation Framework and Case Study on Chinese MRC Datasets0
Context Modeling with Evidence Filter for Multiple Choice Question Answering0
Context-guided Triple Matching for Multiple Choice Question Answering0
AstroMLab 1: Who Wins Astronomy Jeopardy!?0
E-Commerce Promotions Personalization via Online Multiple-Choice Knapsack with Uplift Modeling0
Context-guided Triple Matching for Multiple Choice Question Answering0
A statistical model for aggregating judgments by incorporating peer predictions0
Advanced Financial Reasoning at Scale: A Comprehensive Evaluation of Large Language Models on CFA Level III0
Addressing Blind Guessing: Calibration of Selection Bias in Multiple-Choice Question Answering by Video Language Models0
Edinburgh Clinical NLP at MEDIQA-CORR 2024: Guiding Large Language Models with Hints0
Confidence-Aware Learning Assistant0
Comparative Study of Learning Outcomes for Online Learning Platforms0
Assessing Large Language Models in Mechanical Engineering Education: A Study on Mechanics-Focused Conceptual Understanding0
An Algorithm for Generating Gap-Fill Multiple Choice Questions of an Expert System0
Combining Multiple Cues for Visual Madlibs Question Answering0
Combinatorial framework for planning in geological exploration0
Show:102550
← PrevPage 11 of 45Next →

No leaderboard results yet.