SOTAVerified

Multiple-choice

Papers

Showing 251275 of 1107 papers

TitleStatusHype
CHOICE: Benchmarking the Remote Sensing Capabilities of Large Vision-Language ModelsCode1
MedQA-CS: Benchmarking Large Language Models Clinical Skills Using an AI-SCE FrameworkCode1
Data Contamination Quiz: A Tool to Detect and Estimate Contamination in Large Language ModelsCode1
Delving into the Reversal Curse: How Far Can Large Language Models Generalize?Code1
Mind the Confidence Gap: Overconfidence, Calibration, and Distractor Effects in Large Language ModelsCode1
NextLevelBERT: Masked Language Modeling with Higher-Level Representations for Long DocumentsCode1
Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language ModelsCode1
A Study on Large Language Models' Limitations in Multiple-Choice Question AnsweringCode0
Look at the Text: Instruction-Tuned Language Models are More Robust Multiple Choice Selectors than You ThinkCode0
MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation ModelsCode0
Analogical Reasoning Inside Large Language Models: Concept Vectors and the Limits of AbstractionCode0
LongHalQA: Long-Context Hallucination Evaluation for MultiModal Large Language ModelsCode0
Confident Multiple Choice LearningCode0
Assessing the Quality of Multiple-Choice Questions Using GPT-4 and Rule-Based MethodsCode0
LLMs Are Not Intelligent Thinkers: Introducing Mathematical Topic Tree Benchmark for Comprehensive Evaluation of LLMsCode0
LLaVA-OneVision: Easy Visual Task TransferCode0
LiveQA: A Question Answering Dataset over Sports LiveCode0
COLUMBUS: Evaluating COgnitive Lateral Understanding through Multiple-choice reBUSesCode0
MCQG-SRefine: Multiple Choice Question Generation and Evaluation with Iterative Self-Critique, Correction, and Comparison FeedbackCode0
A Simple Method for Commonsense ReasoningCode0
Leveraging large language models for nano synthesis mechanism explanation: solid foundations or mere conjectures?Code0
Towards Efficient Methods in Medical Question Answering using Knowledge Graph EmbeddingsCode0
A Benchmark for Long-Form Medical Question AnsweringCode0
Length Optimization in Conformal PredictionCode0
CNN for Text-Based Multiple Choice Question AnsweringCode0
Show:102550
← PrevPage 11 of 45Next →

No leaderboard results yet.