SOTAVerified

Multiple-choice

Papers

Showing 401450 of 1107 papers

TitleStatusHype
Exploring syntactic information in sentence embeddings through multilingual subject-verb agreement0
Can Generative Pre-trained Transformers (GPT) Pass Assessments in Higher Education Programming Courses?0
HRCA+: Advanced Multiple-choice Machine Reading Comprehension Method0
Exploring the Comprehension of ChatGPT in Traditional Chinese Medicine Knowledge0
Answering questions by learning to rank - Learning to rank by answering questions0
How Additional Knowledge can Improve Natural Language Commonsense Question Answering?0
HindiLLM: Large Language Model for Hindi0
Evalita-LLM: Benchmarking Large Language Models on Italian0
Can Multimodal LLMs do Visual Temporal Understanding and Reasoning? The answer is No!0
BiRdQA: A Bilingual Dataset for Question Answering on Tricky Riddles0
Establishing Task Scaling Laws via Compute-Efficient Model Ladders0
EQUATOR: A Deterministic Framework for Evaluating LLM Reasoning with Open-Ended Questions. # v1.0.0-beta0
FAMULUS: Interactive Annotation and Feedback Generation for Teaching Diagnostic Reasoning0
FarsEval-PKBETS: A new diverse benchmark for evaluating Persian large language models0
Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation0
Answering questions by learning to rank -- Learning to rank by answering questions0
FAVOR-Bench: A Comprehensive Benchmark for Fine-Grained Video Motion Understanding0
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites0
Few-Shot Image Classification and Segmentation as Visual Question Answering Using Vision-Language Models0
Field-testing items using artificial intelligence: Natural language processing with transformers0
Enhancing Multiple-Choice Question Answering with Causal Knowledge0
Fill-in-the-Blank: A Challenging Video Understanding Evaluation Framework0
Enhancing Multiple-choice Machine Reading Comprehension by Punishing Illogical Interpretations0
Answering Chinese Elementary School Social Studies Multiple Choice Questions0
Enhancing LLMs' Reasoning-Intensive Multimedia Search Capabilities through Fine-Tuning and Reinforcement Learning0
Enhancing LLM Evaluations: The Garbling Trick0
Answering Chinese Elementary School Social Study Multiple Choice Questions0
First Token Probability Guided RAG for Telecom Question Answering0
Enhancing Event Causality Identification with Rationale and Structure-Aware Causal Question Answering0
Are LLM-generated plain language summaries truly understandable? A large-scale crowdsourced evaluation0
Enhancing Distractor Generation for Multiple-Choice Questions with Retrieval Augmented Pretraining and Knowledge Graph Integration0
Bilingual Evaluation of Language Models on General Knowledge in University Entrance Exams with Minimal Contamination0
ForecastQA: A Question Answering Challenge for Event Forecasting with Temporal Text Data0
FoundaBench: Evaluating Chinese Fundamental Knowledge Capabilities of Large Language Models0
AGReE: A system for generating Automated Grammar Reading Exercises0
Framing QA as Building and Ranking Intersentence Answer Justifications0
From ChatGPT to DeepSeek AI: A Comprehensive Analysis of Evolution, Deviation, and Future Implications in AI-Language Models0
From 'F' to 'A' on the N.Y. Regents Science Exams: An Overview of the Aristo Project0
From Generalist to Specialist: Improving Large Language Models for Medical Physics Using ARCoT0
SHARP: Unlocking Interactive Hallucination via Stance Transfer in Role-Playing Agents0
How Far Can Off-the-Shelf Multimodal Large Language Models Go in Online Episodic Memory Question Answering?0
Humanity's Last Exam0
End-to-end Concept Word Detection for Video Captioning, Retrieval, and Question Answering0
Fundamental Limitations in Defending LLM Finetuning APIs0
Cleared for Takeoff? Compositional & Conditional Reasoning may be the Achilles Heel to (Flight-Booking) Language Agents0
FusionMind -- Improving question and answering with external context fusion0
Empowering Large Language Models in Wireless Communication: A Novel Dataset and Fine-Tuning Framework0
Beyond VQA: Generating Multi-word Answer and Rationale to Visual Questions0
LLMs May Perform MCQA by Selecting the Least Incorrect Option0
ELiRF-UPV at SemEval-2018 Task 11: Machine Comprehension using Commonsense Knowledge0
Show:102550
← PrevPage 9 of 23Next →

No leaderboard results yet.