SOTAVerified

Multiple-choice

Papers

Showing 901950 of 1107 papers

TitleStatusHype
A Novel Multi-Stage Prompting Approach for Language Agnostic MCQ Generation using GPTCode0
Can Large Language Models Provide Security & Privacy Advice? Measuring the Ability of LLMs to Refute MisconceptionsCode0
DefAn: Definitive Answer Dataset for LLMs Hallucination EvaluationCode0
ToMChallenges: A Principle-Guided Dataset and Diverse Evaluation Tasks for Exploring Theory of MindCode0
CLOMO: Counterfactual Logical Modification with Large Language ModelsCode0
IdentifyMe: A Challenging Long-Context Mention Resolution Benchmark for LLMsCode0
DetectBench: Can Large Language Model Detect and Piece Together Implicit Evidence?Code0
SecQA: A Concise Question-Answering Dataset for Evaluating Large Language Models in Computer SecurityCode0
What Ingredients Make for an Effective Crowdsourcing Protocol for Difficult NLU Data Collection Tasks?Code0
Assessing the Quality of Multiple-Choice Questions Using GPT-4 and Rule-Based MethodsCode0
TAXI: Evaluating Categorical Knowledge Editing for Language ModelsCode0
WiCkeD: A Simple Method to Make Multiple Choice Benchmarks More ChallengingCode0
What Makes Reading Comprehension Questions Easier?Code0
Downstream Trade-offs of a Family of Text WatermarksCode0
Teach2Eval: An Indirect Evaluation Method for LLM by Judging How It TeachesCode0
A multimodal dataset for understanding the impact of mobile phones on remote online virtual educationCode0
Utilize the Flow before Stepping into the Same River Twice: Certainty Represented Knowledge Flow for Refusal-Aware Instruction TuningCode0
Can Model Uncertainty Function as a Proxy for Multiple-Choice Question Item Difficulty?Code0
Differentiating Choices via Commonality for Multiple-Choice Question AnsweringCode0
Utilizing Background Knowledge for Robust Reasoning over Traffic SituationsCode0
Difficult Task Yes but Simple Task No: Unveiling the Laziness in Multimodal LLMsCode0
Improving Machine Reading Comprehension with General Reading StrategiesCode0
A large language model-assisted education tool to provide feedback on open-ended responsesCode0
DisGeM: Distractor Generation for Multiple Choice Questions with Span MaskingCode0
Analogical Reasoning Inside Large Language Models: Concept Vectors and the Limits of AbstractionCode0
Improving Question Answering with External KnowledgeCode0
Distractor Generation for Multiple Choice Questions Using Learning to RankCode0
Distractor generation for multiple-choice questions with predictive prompting and large language modelsCode0
A Study on Large Language Models' Limitations in Multiple-Choice Question AnsweringCode0
MIRTT: Learning Multimodal Interaction Representations from Trilinear Transformers for Visual Question AnsweringCode0
INCEPTNET: Precise And Early Disease Detection Application For Medical Images AnalysesCode0
DiVERT: Distractor Generation with Variational Errors Represented as Text for Math Multiple-choice QuestionsCode0
DLP-LoRA: Efficient Task-Specific LoRA Fusion with a Dynamic, Lightweight Plugin for Large Language ModelsCode0
DMCL: Distillation Multiple Choice Learning for Multimodal Action RecognitionCode0
Plausibly Problematic Questions in Multiple-Choice Benchmarks for Commonsense ReasoningCode0
Affordably Fine-tuned LLMs Provide Better Answers to Course-specific MCQsCode0
Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language ModelsCode0
Does Multiple Choice Have a Future in the Age of Generative AI? A Posttest-only RCTCode0
VarBench: Robust Language Model Benchmarking Through Dynamic Variable PerturbationCode0
POE: Process of Elimination for Multiple Choice ReasoningCode0
When Retriever-Reader Meets Scenario-Based Multiple-Choice QuestionsCode0
UBENCH: Benchmarking Uncertainty in Large Language Models with Multiple Choice QuestionsCode0
MLaKE: Multilingual Knowledge Editing Benchmark for Large Language ModelsCode0
A Joint Sequence Fusion Model for Video Question Answering and RetrievalCode0
Vision-Language and Large Language Model Performance in Gastroenterology: GPT, Claude, Llama, Phi, Mistral, Gemma, and Quantized ModelsCode0
ChatGPT for GTFS: Benchmarking LLMs on GTFS Understanding and RetrievalCode0
A Profit-Maximizing Strategy for Advertising on the e-Commerce PlatformsCode0
DREAM: A Challenge Dataset and Models for Dialogue-Based Reading ComprehensionCode0
Introducing a framework to assess newly created questions with Natural Language ProcessingCode0
Introducing Flexible Monotone Multiple Choice Item Response Theory Models and Bit ScalesCode0
Show:102550
← PrevPage 19 of 23Next →

No leaderboard results yet.