SOTAVerified

Question Answering

Question answering can be segmented into domain-specific tasks like community question answering and knowledge-base question answering. Popular benchmark datasets for evaluation question answering systems include SQuAD, HotPotQA, bAbI, TriviaQA, WikiQA, and many others. Models for question answering are typically evaluated on metrics like EM and F1. Some recent top performing models are T5 and XLNet.

( Image credit: SQuAD )

Papers

Showing 97769800 of 10817 papers

TitleStatusHype
``Who was Pietro Badoglio?'' Towards a QA system for Italian History0
Why Artificial Intelligence Needs a Task Theory --- And What It Might Look Like0
Why ``Blow Out''? A Structural Analysis of the Movie Dialog Dataset0
Why can't memory networks read effectively?0
Why context matters in VQA and Reasoning: Semantic interventions for VLM input modalities0
Why Does a Visual Question Have Different Answers?0
Why Does ChatGPT Fall Short in Providing Truthful Answers?0
Why Does the VQA Model Answer No?: Improving Reasoning through Visual and Linguistic Inference0
Why Do Masked Neural Language Models Still Need Common Sense Knowledge?0
Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?0
Why-Question Answering using Intra- and Inter-Sentential Causal Relations0
Why Question Answering using Sentiment Analysis and Word Classes0
Why Settle for Just One? Extending EL++ Ontology Embeddings with Many-to-Many Relationships0
Why We Build Local Large Language Models: An Observational Analysis from 35 Japanese and Multilingual LLMs0
Wikidata as a seed for Web Extraction0
Wiki-LLaVA: Hierarchical Retrieval-Augmented Generation for Multimodal LLMs0
WikiMixQA: A Multimodal Benchmark for Question Answering over Tables and Charts0
WikiOmnia: generative QA corpus on the whole Russian Wikipedia0
WikiPassageQA: A Benchmark Collection for Research on Non-factoid Answer Passage Retrieval0
WikiQA: A Challenge Dataset for Open-Domain Question Answering0
WikiTalk: A Spoken Wikipedia-based Open-Domain Knowledge Access System0
WikiWhy: Answering and Explaining Cause-and-Effect Questions0
WildQA: In-the-Wild Video Question Answering0
Will the Prince Get True Love's Kiss? On the Model Sensitivity to Gender Perturbation over Fairytale Texts0
Will this Question be Answered? Question Filtering via Answer Model Distillation for Efficient Question Answering0
Show:102550
← PrevPage 392 of 433Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1IE-Net (ensemble)EM90.94Unverified
2FPNet (ensemble)EM90.87Unverified
3IE-NetV2 (ensemble)EM90.86Unverified
4SA-Net on Albert (ensemble)EM90.72Unverified
5SA-Net-V2 (ensemble)EM90.68Unverified
6FPNet (ensemble)EM90.6Unverified
7Retro-Reader (ensemble)EM90.58Unverified
8EntitySpanFocusV2 (ensemble)EM90.52Unverified
9TransNets + SFVerifier + SFEnsembler (ensemble)EM90.49Unverified
10EntitySpanFocus+AT (ensemble)EM90.45Unverified