SOTAVerified

Multiple-choice

Papers

Showing 601625 of 1107 papers

TitleStatusHype
Fill-in-the-Blank: A Challenging Video Understanding Evaluation Framework0
Fine-tuning BERT with Focus Words for Explanation Regeneration0
An Automatic Evaluation Framework for Multi-turn Medical Consultations Capabilities of Large Language Models0
An Automated Multiple-Choice Question Generation Using Natural Language Processing Techniques0
First Place Solution to the Multiple-choice Video QA Track of The Second Perception Test Challenge0
First Token Probability Guided RAG for Telecom Question Answering0
An Audio-enriched BERT-based Framework for Spoken Multiple-choice Question Answering0
Which of These Best Describes Multiple Choice Evaluation with LLMs? A) Forced B) Flawed C) Fixable D) All of the Above0
Training Optimus Prime, M.D.: Generating Medical Certification Items by Fine-Tuning OpenAI's gpt2 Transformer Model0
ForecastQA: A Question Answering Challenge for Event Forecasting with Temporal Text Data0
FoundaBench: Evaluating Chinese Fundamental Knowledge Capabilities of Large Language Models0
Framing QA as Building and Ranking Intersentence Answer Justifications0
From ChatGPT to DeepSeek AI: A Comprehensive Analysis of Evolution, Deviation, and Future Implications in AI-Language Models0
From 'F' to 'A' on the N.Y. Regents Science Exams: An Overview of the Aristo Project0
From Generalist to Specialist: Improving Large Language Models for Medical Physics Using ARCoT0
SHARP: Unlocking Interactive Hallucination via Stance Transfer in Role-Playing Agents0
Fundamental Limitations in Defending LLM Finetuning APIs0
FusionMind -- Improving question and answering with external context fusion0
GANDALF: a General Character Name Description Dataset for Long Fiction0
GEMeX: A Large-Scale, Groundable, and Explainable Medical VQA Benchmark for Chest X-ray Diagnosis0
Generalised Winograd Schema and its Contextuality0
Generalization v.s. Memorization: Tracing Language Models' Capabilities Back to Pretraining Data0
Who did What: A Large-Scale Person-Centered Cloze Dataset0
Generating Adequate Distractors for Multiple-Choice Questions0
Generating Correct Answers for Progressive Matrices Intelligence Tests0
Show:102550
← PrevPage 25 of 45Next →

No leaderboard results yet.