SOTAVerified

Multiple-choice

Papers

Showing 701750 of 1107 papers

TitleStatusHype
SecBench: A Comprehensive Multi-Dimensional Benchmarking Dataset for LLMs in Cybersecurity0
SECURA: Sigmoid-Enhanced CUR Decomposition with Uninterrupted Retention and Low-Rank Adaptation in Large Language Models0
Advanced Financial Reasoning at Scale: A Comprehensive Evaluation of Large Language Models on CFA Level III0
Addressing Blind Guessing: Calibration of Selection Bias in Multiple-Choice Question Answering by Video Language Models0
From Human Days to Machine Seconds: Automatically Answering and Generating Machine Learning Final Exams0
A Data-Driven Study of Commonsense Knowledge using the ConceptNet Knowledge Base0
Seeing the Forest and the Trees: Solving Visual Graph and Tree Based Data Structure Problems using Large Multimodal Models0
Selective Particle Attention: Visual Feature-Based Attention in Deep Reinforcement Learning0
Self-Evaluation Improves Selective Generation in Large Language Models0
Adaptive Wizard for Removing Cross-Tier Misconfigurations in Active Directory0
Self-supervised pre-training and contrastive representation learning for multiple-choice video QA0
Self-Teaching Machines to Read and Comprehend with Large-Scale Multi-Subject Question-Answering Data0
Semi-automatic Generation of Multiple-Choice Tests from Mentions of Semantic Relations0
Separation of Powers: On Segregating Knowledge from Observation in LLM-enabled Knowledge-based Visual Question Answering0
Set-LLM: A Permutation-Invariant LLM0
Setting Standards in Turkish NLP: TR-MMLU for Large Language Model Evaluation0
Single-Turn Debate Does Not Help Humans Answer Hard Reading-Comprehension Questions0
Social Bias Benchmark for Generation: A Comparison of Generation and QA-Based Evaluations0
Social IQa: Commonsense Reasoning about Social Interactions0
Solving Visual Madlibs with Multiple Cues0
SOSBENCH: Benchmarking Safety Alignment on Scientific Knowledge0
Sparse Attention Vectors: Generative Multimodal Model Features Are Discriminative Vision-Language Classifiers0
Spending Money Wisely: Online Electronic Coupon Allocation based on Real-Time User Intent Detection0
VUDG: A Dataset for Video Understanding Domain Generalization0
SPRITE: A Response Model For Multiple Choice Testing0
Weighted Global Normalization for Multiple Choice Reading Comprehension over Long Documents0
Recent Advances in Multi-Choice Machine Reading Comprehension: A Survey on Methods and Datasets0
Correctness Coverage Evaluation for Medical Multiple-Choice Question Answering Based on the Enhanced Conformal Prediction Framework0
Statistically Profiling Biases in Natural Language Reasoning Datasets and Models0
Adaptive Crowdsourcing Algorithms for the Bandit Survey Problem0
Stick to your Role! Stability of Personal Values Expressed in Large Language Models0
Stochastic Multiple Choice Learning for Training Diverse Deep Ensembles0
Adapting Vision-Language Models for Evaluating World Models0
Strategyproof Mean Estimation from Multiple-Choice Questions0
Structured Outputs Enable General-Purpose LLMs to be Medical Experts0
What does BERT Learn from Multiple-Choice Reading Comprehension Datasets?0
Superhuman performance of a large language model on the reasoning tasks of a physician0
What do we expect from Multiple-choice QA Systems?0
What Gives the Answer Away? Question Answering Bias Analysis on Video QA Datasets0
Susu Box or Piggy Bank: Assessing Cultural Commonsense Knowledge between Ghana and the U.S0
SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference0
SynDARin: Synthesising Datasets for Automated Reasoning in Low-Resource Languages0
TabMCQ: A Dataset of General Knowledge Tables and Multiple-choice Questions0
TA-MAMC at SemEval-2021 Task 4: Task-adaptive Pretraining and Multi-head Attention for Abstract Meaning Reading Comprehension0
Task-Adaptive Pretrained Language Models via Clustered-Importance Sampling0
TCM-Ladder: A Benchmark for Multimodal Question Answering on Traditional Chinese Medicine0
Tell Me Who Your Students Are: GPT Can Generate Valid Multiple-Choice Questions When Students' (Mis)Understanding Is Hinted0
Empowering Sentence Encoders with Prompting and Label Retrieval for Zero-shot Text Classification0
Testing Uncertainty of Large Language Models for Physics Knowledge and Reasoning0
Answering Chinese Elementary School Social Studies Multiple Choice Questions0
Show:102550
← PrevPage 15 of 23Next →

No leaderboard results yet.