SOTAVerified

Multiple-choice

Papers

Showing 10261050 of 1107 papers

TitleStatusHype
SocialIQA: Commonsense Reasoning about Social InteractionsCode0
Questioning the Survey Responses of Large Language ModelsCode0
Exposing the Limits of Video-Text Models through Contrast SetsCode0
Extracting Keywords from Open-Ended Business Survey QuestionsCode0
Question-Instructed Visual Descriptions for Zero-Shot Video Question AnsweringCode0
Learning to Attend On Essential Terms: An Enhanced Retriever-Reader Model for Open-domain Question AnsweringCode0
Learning to Correction: Explainable Feedback Generation for Visual Commonsense Reasoning DistractorCode0
Learning to Reuse Distractors to support Multiple Choice Question Generation in EducationCode0
"My Answer is C": First-Token Probabilities Do Not Match Text Answers in Instruction-Tuned Language ModelsCode0
Probabilities of Chat LLMs Are Miscalibrated but Still Predict Correctness on Multiple-Choice Q&ACode0
FAT ALBERT: Finding Answers in Large Texts using Semantic Similarity Attention Layer based on BERTCode0
Leaving the barn door open for Clever Hans: Simple features predict LLM benchmark answersCode0
TRACE: Transformer-based Risk Assessment for Clinical EvaluationCode0
LEAVS: An LLM-based Labeler for Abdominal CT SupervisionCode0
Wrong Answers Can Also Be Useful: PlausibleQA -- A Large-Scale QA Dataset with Answer Plausibility ScoresCode0
Length Optimization in Conformal PredictionCode0
FIBER: Fill-in-the-Blanks as a Challenging Video Understanding Evaluation FrameworkCode0
Training-free LLM Merging for Multi-task LearningCode0
Solving and Generating NPR Sunday Puzzles with Large Language ModelsCode0
HSI: Head-Specific Intervention Can Induce Misaligned AI Coordination in Large Language ModelsCode0
Towards Efficient Methods in Medical Question Answering using Knowledge Graph EmbeddingsCode0
UnibucLLM: Harnessing LLMs for Automated Prediction of Item Difficulty and Response Time for Multiple-Choice QuestionsCode0
Solving Machine Learning ProblemsCode0
Leveraging large language models for nano synthesis mechanism explanation: solid foundations or mere conjectures?Code0
Fleurs-SLU: A Massively Multilingual Benchmark for Spoken Language UnderstandingCode0
Show:102550
← PrevPage 42 of 45Next →

No leaderboard results yet.