SOTAVerified

Question Answering

Question answering can be segmented into domain-specific tasks like community question answering and knowledge-base question answering. Popular benchmark datasets for evaluation question answering systems include SQuAD, HotPotQA, bAbI, TriviaQA, WikiQA, and many others. Models for question answering are typically evaluated on metrics like EM and F1. Some recent top performing models are T5 and XLNet.

( Image credit: SQuAD )

Papers

Showing 97519800 of 10817 papers

TitleStatusHype
What Question Answering can Learn from Trivia Nerds0
What Should I Do Now? Marrying Reinforcement Learning and Symbolic Planning0
What's in an Explanation? Characterizing Knowledge and Inference Requirements for Elementary Science Exams0
What's in your Head? Emergent Behaviour in Multi-Task Transformer Models0
What’s in Your Head? Emergent Behaviour in Multi-Task Transformer Models0
What Would a Teacher Do? Predicting Future Talk Moves0
What Would it Take to get Biomedical QA Systems into Practice?0
When ACE met KBP: End-to-End Evaluation of Knowledge Base Population with Component-level Annotation0
When are Lemons Purple? The Concept Association Bias of Vision-Language Models0
When Crowd Meets Persona: Creating a Large-Scale Open-Domain Persona Dialogue Corpus0
When Giant Language Brains Just Aren't Enough! Domain Pizzazz with Knowledge Sparkle Dust0
When is dataset cartography ineffective? Using training dynamics does not improve robustness against Adversarial SQuAD0
When to Read Documents or QA History: On Unified and Selective Open-domain QA0
When to Speak, When to Abstain: Contrastive Decoding with Abstention0
When Two LLMs Debate, Both Think They'll Win0
Where is Linked Data in Question Answering over Linked Data?0
Where is this coming from? Making groundedness count in the evaluation of Document VQA models0
Where To Look: Focus Regions for Visual Question Answering0
Where Was Alexander the Great in 325 BC? Toward Understanding History Text with a World Model0
Where Was COVID-19 First Discovered? Designing a Question-Answering System for Pandemic Situations0
Which Client is Reliable?: A Reliable and Personalized Prompt-based Federated Learning for Medical Image Question Answering0
Which Linguist Invented the Lightbulb? Presupposition Verification for Question-Answering0
Which of These Best Describes Multiple Choice Evaluation with LLMs? A) Forced B) Flawed C) Fixable D) All of the Above0
Which Step Do I Take First? Troubleshooting with Bayesian Models0
``Who was Pietro Badoglio?'' Towards a QA system for Italian History0
Why Artificial Intelligence Needs a Task Theory --- And What It Might Look Like0
Why ``Blow Out''? A Structural Analysis of the Movie Dialog Dataset0
Why can't memory networks read effectively?0
Why context matters in VQA and Reasoning: Semantic interventions for VLM input modalities0
Why Does a Visual Question Have Different Answers?0
Why Does ChatGPT Fall Short in Providing Truthful Answers?0
Why Does the VQA Model Answer No?: Improving Reasoning through Visual and Linguistic Inference0
Why Do Masked Neural Language Models Still Need Common Sense Knowledge?0
Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?0
Why-Question Answering using Intra- and Inter-Sentential Causal Relations0
Why Question Answering using Sentiment Analysis and Word Classes0
Why Settle for Just One? Extending EL++ Ontology Embeddings with Many-to-Many Relationships0
Why We Build Local Large Language Models: An Observational Analysis from 35 Japanese and Multilingual LLMs0
Wikidata as a seed for Web Extraction0
Wiki-LLaVA: Hierarchical Retrieval-Augmented Generation for Multimodal LLMs0
WikiMixQA: A Multimodal Benchmark for Question Answering over Tables and Charts0
WikiOmnia: generative QA corpus on the whole Russian Wikipedia0
WikiPassageQA: A Benchmark Collection for Research on Non-factoid Answer Passage Retrieval0
WikiQA: A Challenge Dataset for Open-Domain Question Answering0
WikiTalk: A Spoken Wikipedia-based Open-Domain Knowledge Access System0
WikiWhy: Answering and Explaining Cause-and-Effect Questions0
WildQA: In-the-Wild Video Question Answering0
Will the Prince Get True Love's Kiss? On the Model Sensitivity to Gender Perturbation over Fairytale Texts0
Will this Question be Answered? Question Filtering via Answer Model Distillation for Efficient Question Answering0
Winning Big with Small Models: Knowledge Distillation vs. Self-Training for Reducing Hallucination in QA Agents0
Show:102550
← PrevPage 196 of 217Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1IE-Net (ensemble)EM90.94Unverified
2FPNet (ensemble)EM90.87Unverified
3IE-NetV2 (ensemble)EM90.86Unverified
4SA-Net on Albert (ensemble)EM90.72Unverified
5SA-Net-V2 (ensemble)EM90.68Unverified
6FPNet (ensemble)EM90.6Unverified
7Retro-Reader (ensemble)EM90.58Unverified
8EntitySpanFocusV2 (ensemble)EM90.52Unverified
9TransNets + SFVerifier + SFEnsembler (ensemble)EM90.49Unverified
10EntitySpanFocus+AT (ensemble)EM90.45Unverified