SOTAVerified

Question Answering

Question answering can be segmented into domain-specific tasks like community question answering and knowledge-base question answering. Popular benchmark datasets for evaluation question answering systems include SQuAD, HotPotQA, bAbI, TriviaQA, WikiQA, and many others. Models for question answering are typically evaluated on metrics like EM and F1. Some recent top performing models are T5 and XLNet.

( Image credit: SQuAD )

Papers

Showing 58015825 of 10817 papers

TitleStatusHype
Linguistic Embeddings as a Common-Sense Knowledge Repository: Challenges and Opportunities0
Linguistic Resources for Entity Linking Evaluation: from Monolingual to Cross-lingual0
A Thousand Words Are Worth More Than a Picture: Natural Language-Centric Outside-Knowledge Visual Question Answering0
Identifying the Provision of Choices in Privacy Policy Text0
LINKAGE: Listwise Ranking among Varied-Quality References for Non-Factoid QA Evaluation via LLMs0
Identifying Supporting Facts for Multi-hop Question Answering with Document Graph Networks0
Identifying Shopping Intent in Product QA for Proactive Recommendations0
Linking, Searching, and Visualizing Entities in Wikipedia0
Conversational Question Answering on Heterogeneous Sources0
LIORI at SemEval-2021 Task 2: Span Prediction and Binary Classification approaches to Word-in-Context Disambiguation0
LIORI at SemEval-2021 Task 8: Ask Transformer for measurements0
LIPN-CORE: Semantic Text Similarity using n-grams, WordNet, Syntactic Analysis, ESA and Information Retrieval based Features0
A Theoretically Grounded Benchmark for Evaluating Machine Commonsense0
Do LLMs Know When to NOT Answer? Investigating Abstention Abilities of Large Language Models0
Listening Comprehension over Argumentative Content0
Listening to the Wise Few: Select-and-Copy Attention Heads for Multiple-Choice QA0
Do LLMs Understand Ambiguity in Text? A Case Study in Open-world Question Answering0
LIST-LUX: Disorder Identification from Clinical Texts0
A Deep Cascade Model for Multi-Document Reading Comprehension0
Making the Most of What You Have: Adapting Pre-trained Visual Language Models in the Low-data Regime0
mALBERT: Is a Compact Multilingual BERT Model Still Worth It?0
LiteVL: Efficient Video-Language Learning with Enhanced Spatial-Temporal Modeling0
Litigation Analytics: Extracting and querying motions and orders from US federal courts0
Dolphin: A Challenging and Diverse Benchmark for Arabic NLG0
Identifying Purpose Behind Electoral Tweets0
Show:102550
← PrevPage 233 of 433Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1IE-Net (ensemble)EM90.94Unverified
2FPNet (ensemble)EM90.87Unverified
3IE-NetV2 (ensemble)EM90.86Unverified
4SA-Net on Albert (ensemble)EM90.72Unverified
5SA-Net-V2 (ensemble)EM90.68Unverified
6FPNet (ensemble)EM90.6Unverified
7Retro-Reader (ensemble)EM90.58Unverified
8EntitySpanFocusV2 (ensemble)EM90.52Unverified
9TransNets + SFVerifier + SFEnsembler (ensemble)EM90.49Unverified
10EntitySpanFocus+AT (ensemble)EM90.45Unverified