SOTAVerified

Multiple-choice

Papers

Showing 751800 of 1107 papers

TitleStatusHype
LMVE at SemEval-2020 Task 4: Commonsense Validation and Explanation using Pretraining Language Model0
Localizing AI: Evaluating Open-Weight Language Models for Languages of Baltic States0
Unlocking Video-LLM via Agent-of-Thoughts Distillation0
Unmasking Deceptive Visuals: Benchmarking Multimodal Large Language Models on Misleading Chart Question Answering0
LogiDynamics: Unraveling the Dynamics of Logical Inference in Large Language Model Reasoning0
LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models0
An Add-On for Empowering Google Forms to be an Automatic Question Generator in Online Assessments0
Unsupervised Explanation Generation for Machine Reading Comprehension0
Unsupervised multiple-choice question generation for out-of-domain Q\&A fine-tuning0
LongPerceptualThoughts: Distilling System-2 Reasoning for System-1 Perception0
LongReason: A Synthetic Long-Context Reasoning Benchmark via Context Expansion0
LookAlike: Consistent Distractor Generation in Math MCQs0
Looking Beyond Sentence-Level Natural Language Inference for Question Answering and Text Summarization0
Looking Beyond Short-Premise Natural Language Inference for Downstream Tasks0
Unsupervised multiple-choice question generation for out-of-domain Q&A fine-tuning0
Make a Choice! Knowledge Base Question Answering with In-Context Learning0
Amobee at SemEval-2019 Tasks 5 and 6: Multiple Choice CNN Over Contextual Embedding0
MANTA: A Large-Scale Multi-View and Visual-Text Anomaly Detection Dataset for Tiny Objects0
Unsupervised multiple choices question answering via universal corpus0
MateInfoUB: A Real-World Benchmark for Testing LLMs in Competitive, Multilingual, and Multimodal Educational Tasks0
MaterialBENCH: Evaluating College-Level Materials Science Problem-Solving Abilities of Large Language Models0
Math Multiple Choice Question Generation via Human-Large Language Model Collaboration0
MCL-GAN: Generative Adversarial Networks with Multiple Specialized Discriminators0
MCQA-Eval: Efficient Confidence Evaluation in NLG with Gold-Standard Correctness Labels0
MCS-SQL: Leveraging Multiple Prompts and Multiple-Choice Selection For Text-to-SQL Generation0
Measuring Semantic Similarity by Latent Relational Analysis0
MedGPT: Medical Concept Prediction from Clinical Narratives0
MedGUIDE: Benchmarking Clinical Decision-Making in Large Language Models0
MeDiaQA: A Question Answering Dataset on Medical Dialogues0
MedKP: Medical Dialogue with Knowledge Enhancement and Clinical Pathway Encoding0
A Method for Building a Commonsense Inference Dataset based on Basic Events0
Unveiling Cultural Blind Spots: Analyzing the Limitations of mLLMs in Procedural Text Comprehension0
Med-RLVR: Emerging Medical Reasoning from a 3B base model via reinforcement Learning0
AmazUtah_NLP at SemEval-2024 Task 9: A MultiChoice Question Answering System for Commonsense Defying Reasoning0
UrbanVideo-Bench: Benchmarking Vision-Language Models on Embodied Intelligence with Video Data in Urban Spaces0
AlignMMBench: Evaluating Chinese Multimodal Alignment in Large Vision-Language Models0
Meta Sequence Learning for Generating Adequate Question-Answer Pairs0
MHQA: A Diverse, Knowledge Intensive Mental Health Question Answering Challenge for Language Models0
MIBench: Evaluating Multimodal Large Language Models over Multiple Images0
Use neural networks to recognize students' handwritten letters and incorrect symbols0
Using contradictions improves question answering systems0
Using Large Language Models for Automated Grading of Student Writing about Science0
Are Machines Better at Complex Reasoning? Unveiling Human-Machine Inference Gaps in Entailment Verification0
MINI-LLM: Memory-Efficient Structured Pruning for Large Language Models0
Mitigating Bias for Question Answering Models by Tracking Bias Influence0
Mitigating Selection Bias with Node Pruning and Auxiliary Options0
MixQG: Neural Question Generation with Mixed Answer Types0
ZeroTuning: Unlocking the Initial Token's Power to Enhance Large Language Models Without Training0
A Comparative Study of AI-Generated (GPT-4) and Human-crafted MCQs in Programming Education0
A Joint-Reasoning based Disease Q&A System0
Show:102550
← PrevPage 16 of 23Next →

No leaderboard results yet.