SOTAVerified

Multiple-choice

Papers

Showing 701725 of 1107 papers

TitleStatusHype
SecBench: A Comprehensive Multi-Dimensional Benchmarking Dataset for LLMs in Cybersecurity0
SECURA: Sigmoid-Enhanced CUR Decomposition with Uninterrupted Retention and Low-Rank Adaptation in Large Language Models0
Advanced Financial Reasoning at Scale: A Comprehensive Evaluation of Large Language Models on CFA Level III0
Addressing Blind Guessing: Calibration of Selection Bias in Multiple-Choice Question Answering by Video Language Models0
From Human Days to Machine Seconds: Automatically Answering and Generating Machine Learning Final Exams0
A Data-Driven Study of Commonsense Knowledge using the ConceptNet Knowledge Base0
Seeing the Forest and the Trees: Solving Visual Graph and Tree Based Data Structure Problems using Large Multimodal Models0
Selective Particle Attention: Visual Feature-Based Attention in Deep Reinforcement Learning0
Self-Evaluation Improves Selective Generation in Large Language Models0
Adaptive Wizard for Removing Cross-Tier Misconfigurations in Active Directory0
Self-supervised pre-training and contrastive representation learning for multiple-choice video QA0
Self-Teaching Machines to Read and Comprehend with Large-Scale Multi-Subject Question-Answering Data0
Semi-automatic Generation of Multiple-Choice Tests from Mentions of Semantic Relations0
Separation of Powers: On Segregating Knowledge from Observation in LLM-enabled Knowledge-based Visual Question Answering0
Set-LLM: A Permutation-Invariant LLM0
Setting Standards in Turkish NLP: TR-MMLU for Large Language Model Evaluation0
Single-Turn Debate Does Not Help Humans Answer Hard Reading-Comprehension Questions0
Social Bias Benchmark for Generation: A Comparison of Generation and QA-Based Evaluations0
Social IQa: Commonsense Reasoning about Social Interactions0
Solving Visual Madlibs with Multiple Cues0
SOSBENCH: Benchmarking Safety Alignment on Scientific Knowledge0
Sparse Attention Vectors: Generative Multimodal Model Features Are Discriminative Vision-Language Classifiers0
Spending Money Wisely: Online Electronic Coupon Allocation based on Real-Time User Intent Detection0
VUDG: A Dataset for Video Understanding Domain Generalization0
SPRITE: A Response Model For Multiple Choice Testing0
Show:102550
← PrevPage 29 of 45Next →

No leaderboard results yet.