Question Answering

Question answering can be segmented into domain-specific tasks like community question answering and knowledge-base question answering. Popular benchmark datasets for evaluation question answering systems include SQuAD, HotPotQA, bAbI, TriviaQA, WikiQA, and many others. Models for question answering are typically evaluated on metrics like EM and F1. Some recent top performing models are T5 and XLNet.

( Image credit: SQuAD )

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 5651–5700 of 10817 papers

Title	Date	Tasks	Status
Beyond Attention: Toward Machines with Intrinsic Higher Mental States	May 2, 2025	Question Answering	—Unverified
Dolphin: A Challenging and Diverse Benchmark for Arabic NLG	May 24, 2023	Dialogue GenerationDiversity	—Unverified
Annotating Educational Questions for Student Response Analysis	May 1, 2018	Question AnsweringWord Embeddings	—Unverified
A Flexible, Efficient and Accurate Framework for Community Question Answering Pipelines	Jul 1, 2018	Community Question AnsweringQuestion Answering	—Unverified
A Comprehensive Survey on Visual Question Answering Datasets and Algorithms	Nov 17, 2024	DiagnosticMiscellaneous	—Unverified
3D Question Answering for City Scene Understanding	Jul 24, 2024	Autonomous DrivingQuestion Answering	—Unverified
Do LLMs Work on Charts? Designing Few-Shot Prompts for Chart Question Answering and Summarization	Dec 17, 2023	Chart Question AnsweringQuestion Answering	—Unverified
Do LLMs Understand Ambiguity in Text? A Case Study in Open-world Question Answering	Nov 19, 2024	Fact CheckingOpen-Domain Question Answering	—Unverified
Do LLMs Recognize me, When I is not me: Assessment of LLMs Understanding of Turkish Indexical Pronouns in Indexical Shift Contexts	Jun 8, 2024	Machine TranslationMultiple-choice	—Unverified
Annotating Coordination in the Penn Treebank	Jul 1, 2012	Machine TranslationQuestion Answering	—Unverified
Do LLMs Know When to NOT Answer? Investigating Abstention Abilities of Large Language Models	Jul 23, 2024	Language ModellingLarge Language Model	—Unverified
Do LLMs Exhibit Human-Like Reasoning? Evaluating Theory of Mind in LLMs for Open-Ended Responses	Jun 9, 2024	Question AnsweringSemantic Similarity	—Unverified
Do Large Language Models Know about Facts?	Oct 8, 2023	Question AnsweringText Generation	—Unverified
Better Retrieval May Not Lead to Better Question Answering	May 7, 2022	Information RetrievalOpen-Domain Question Answering	—Unverified
Annotating Attribution Relations in Arabic	May 1, 2018	Information RetrievalOpinion Mining	—Unverified
A Flexible and Extensible Framework for Multiple Answer Modes Question Answering	Oct 1, 2021	Answer GenerationQuestion Answering	—Unverified
Do KG-augmented Models Leverage Knowledge as Humans Do?	Jan 17, 2022	Knowledge GraphsQuestion Answering	—Unverified
Do great minds think alike? Investigating Human-AI Complementarity in Question Answering with CAIMIRA	Oct 9, 2024	Information RetrievalQuestion Answering	—Unverified
Do Fine-tuned Commonsense Language Models Really Generalize?	Nov 18, 2020	Multiple-choiceQuestion Answering	—Unverified
Annotating and Predicting Non-Restrictive Noun Phrase Modifications	Aug 1, 2016	Abstractive Text SummarizationKnowledge Base Population	—Unverified
Do Explanations make VQA Models more Predictable to a Human?	Oct 29, 2018	Question AnsweringVisual Question Answering	—Unverified
Does Your 3D Encoder Really Work? When Pretrain-SFT from 2D VLMs Meets 3D VLMs	Jun 5, 2025	cross-modal alignmentDense Captioning	—Unverified
Better Query Graph Selection for Knowledge Base Question Answering	Apr 27, 2022	Knowledge Base Question AnsweringQuestion Answering	—Unverified
Affordances in Grounded Language Learning	Jul 1, 2018	Grounded language learningQuestion Answering	—Unverified
A Comprehensive Survey on Evaluating Large Language Model Applications in the Medical Industry	Apr 24, 2024	Information RetrievalLanguage Modeling	—Unverified
Does the Whole Exceed its Parts? The Effect of AI Explanations on Complementary Team Performance	Jun 26, 2020	Decision MakingQuestion Answering	—Unverified
Does the "most sinfully decadent cake ever" taste good? Answering Yes/No Questions from Figurative Contexts	Sep 24, 2023	Question Answering	—Unverified
Better Early than Late: Fusing Topics with Word Embeddings for Neural Question Paraphrase Identification	Jul 22, 2020	Community Question AnsweringParaphrase Identification	—Unverified
Does the Generator Mind its Contexts? An Analysis of Generative Model Faithfulness under Context Transfer	Feb 22, 2024	Generative Question AnsweringHallucination	—Unverified
Does Synthetic Data Generation of LLMs Help Clinical Text Mining?	Mar 8, 2023	Code Generationnamed-entity-recognition	—Unverified
Better Distractions: Transformer-based Distractor Generation and Multiple Choice Question Filtering	Oct 19, 2020	Distractor GenerationLanguage Modeling	—Unverified
Does Similarity Matter? The Case of Answer Extraction from Technical Discussion Forums	Dec 1, 2012	Question AnsweringSentence Classification	—Unverified
Best Response Shaping	Apr 5, 2024	Deep Reinforcement LearningQuestion Answering	—Unverified
Does QA-based intermediate training help fine-tuning language models for text classification?	Dec 30, 2021	ClassificationQuestion Answering	—Unverified
Does Object Grounding Really Reduce Hallucination of Large Vision-Language Models?	Jun 20, 2024	Caption GenerationHallucination	—Unverified
BESTMVQA: A Benchmark Evaluation System for Medical Visual Question Answering	Dec 13, 2023	Medical Visual Question AnsweringQuestion Answering	—Unverified
Annotate and Identify Modalities, Speech Acts and Finer-Grained Event Types in Chinese Text	Aug 1, 2014	Machine TranslationQuestion Answering	—Unverified
Does my multimodal model learn cross-modal interactions? It's harder to tell than you might think!	Oct 13, 2020	DiagnosticImage-text Classification	—Unverified
Does Entity Abstraction Help Generative Transformers Reason?	Jan 5, 2022	Conversational Question AnsweringLogical Reasoning	—Unverified
Best-Answer Prediction in Q&A Sites Using User Information	Dec 15, 2022	Community Question AnsweringQuestion Answering	—Unverified
Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla	Jul 18, 2023	Multiple-choiceQuestion Answering	—Unverified
ANNA”:" Enhanced Language Representation for Question Answering	May 1, 2022	Language ModelingLanguage Modelling	—Unverified
DocVideoQA: Towards Comprehensive Understanding of Document-Centric Videos through Question Answering	Mar 20, 2025	Contrastive LearningQuestion Answering	—Unverified
BERT vs GPT for financial engineering	Apr 24, 2024	Machine TranslationQuestion Answering	—Unverified
ANNA: Enhanced Language Representation for Question Answering	Mar 28, 2022	Language ModelingLanguage Modelling	—Unverified
Affective Visual Dialog: A Large-Scale Benchmark for Emotional Reasoning Based on Visually Grounded Conversations	Aug 30, 2023	Explanation GenerationQuestion Answering	—Unverified
A Comprehensive Survey on Relation Extraction: Recent Advances and New Frontiers	Jun 3, 2023	Information RetrievalKnowledge Graph Completion	—Unverified
3D Question Answering	Dec 15, 2021	3D geometryQuestion Answering	—Unverified
Document Visual Question Answering Challenge 2020	Aug 20, 2020	Question AnsweringRetrieval	—Unverified
Document Structure aware Relational Graph Convolutional Networks for Ontology Population	Apr 27, 2021	Hypernym DiscoveryQuestion Answering	—Unverified

Show:10 25 50

← PrevPage 114 of 217Next →

All datasets SQuAD2.0 SQuAD1.1 HotpotQA PIQA BoolQ COPA TriviaQA SQuAD1.1 dev Natural Questions OpenBookQA TruthfulQA MultiRC

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	IE-Net (ensemble)	EM	90.94	—	Unverified
2	FPNet (ensemble)	EM	90.87	—	Unverified
3	IE-NetV2 (ensemble)	EM	90.86	—	Unverified
4	SA-Net on Albert (ensemble)	EM	90.72	—	Unverified
5	SA-Net-V2 (ensemble)	EM	90.68	—	Unverified
6	FPNet (ensemble)	EM	90.6	—	Unverified
7	Retro-Reader (ensemble)	EM	90.58	—	Unverified
8	EntitySpanFocusV2 (ensemble)	EM	90.52	—	Unverified
9	TransNets + SFVerifier + SFEnsembler (ensemble)	EM	90.49	—	Unverified
10	EntitySpanFocus+AT (ensemble)	EM	90.45	—	Unverified