SOTAVerified

Multiple-choice

Papers

Showing 426450 of 1107 papers

TitleStatusHype
Enhancing LLM Evaluations: The Garbling Trick0
Answering Chinese Elementary School Social Study Multiple Choice Questions0
First Token Probability Guided RAG for Telecom Question Answering0
Enhancing Event Causality Identification with Rationale and Structure-Aware Causal Question Answering0
Are LLM-generated plain language summaries truly understandable? A large-scale crowdsourced evaluation0
Enhancing Distractor Generation for Multiple-Choice Questions with Retrieval Augmented Pretraining and Knowledge Graph Integration0
Bilingual Evaluation of Language Models on General Knowledge in University Entrance Exams with Minimal Contamination0
ForecastQA: A Question Answering Challenge for Event Forecasting with Temporal Text Data0
FoundaBench: Evaluating Chinese Fundamental Knowledge Capabilities of Large Language Models0
AGReE: A system for generating Automated Grammar Reading Exercises0
Framing QA as Building and Ranking Intersentence Answer Justifications0
From ChatGPT to DeepSeek AI: A Comprehensive Analysis of Evolution, Deviation, and Future Implications in AI-Language Models0
From 'F' to 'A' on the N.Y. Regents Science Exams: An Overview of the Aristo Project0
From Generalist to Specialist: Improving Large Language Models for Medical Physics Using ARCoT0
SHARP: Unlocking Interactive Hallucination via Stance Transfer in Role-Playing Agents0
How Far Can Off-the-Shelf Multimodal Large Language Models Go in Online Episodic Memory Question Answering?0
Humanity's Last Exam0
End-to-end Concept Word Detection for Video Captioning, Retrieval, and Question Answering0
Fundamental Limitations in Defending LLM Finetuning APIs0
Cleared for Takeoff? Compositional & Conditional Reasoning may be the Achilles Heel to (Flight-Booking) Language Agents0
FusionMind -- Improving question and answering with external context fusion0
Empowering Large Language Models in Wireless Communication: A Novel Dataset and Fine-Tuning Framework0
Beyond VQA: Generating Multi-word Answer and Rationale to Visual Questions0
LLMs May Perform MCQA by Selecting the Least Incorrect Option0
ELiRF-UPV at SemEval-2018 Task 11: Machine Comprehension using Commonsense Knowledge0
Show:102550
← PrevPage 18 of 45Next →

No leaderboard results yet.