SOTAVerified

Question Answering

Question answering can be segmented into domain-specific tasks like community question answering and knowledge-base question answering. Popular benchmark datasets for evaluation question answering systems include SQuAD, HotPotQA, bAbI, TriviaQA, WikiQA, and many others. Models for question answering are typically evaluated on metrics like EM and F1. Some recent top performing models are T5 and XLNet.

( Image credit: SQuAD )

Papers

Showing 67766800 of 10817 papers

TitleStatusHype
When Does Pretraining Help? Assessing Self-Supervised Learning for Law and the CaseHOLD DatasetCode1
Contextualized Query Embeddings for Conversational Search0
Generative Context Pair Selection for Multi-hop Question Answering0
FedNLP: Benchmarking Federated Learning Methods for Natural Language Processing TasksCode0
Case-based Reasoning for Natural Language Queries over Knowledge Bases0
Can NLI Models Verify QA Systems' Predictions?Code1
Cross-Task Generalization via Natural Language Crowdsourcing InstructionsCode2
Improving Question Answering Model Robustness with Synthetic Adversarial Data Generation0
GooAQ: Open Question Answering with Diverse Answer TypesCode1
ASBERT: Siamese and Triplet network embedding for open question answering0
Multi-Perspective Abstractive Answer Summarization0
A Graph-guided Multi-round Retrieval Method for Conversational Open-domain Question Answering0
Explaining Answers with Entailment TreesCode1
Mobile App Tasks with Iterative Feedback (MoTIF): Addressing Task Feasibility in Interactive Visual EnvironmentsCode1
BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval ModelsCode2
Joint Passage Ranking for Diverse Multi-Answer Retrieval0
ESTER: A Machine Reading Comprehension Dataset for Event Semantic Relation ReasoningCode1
Q^2: Evaluating Factual Consistency in Knowledge-Grounded Dialogues via Question Generation and Question AnsweringCode1
Capturing Row and Column Semantics in Transformer Based Question Answering over TablesCode1
Multivalent Entailment Graphs for Question Answering0
What to Pre-Train on? Efficient Intermediate Task SelectionCode1
Cross-Modal Retrieval Augmentation for Multi-Modal Classification0
IndoNLG: Benchmark and Resources for Evaluating Indonesian Natural Language GenerationCode1
VGNMN: Video-grounded Neural Module Network to Video-Grounded Language Tasks0
Editing Factual Knowledge in Language ModelsCode1
Show:102550
← PrevPage 272 of 433Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1IE-Net (ensemble)EM90.94Unverified
2FPNet (ensemble)EM90.87Unverified
3IE-NetV2 (ensemble)EM90.86Unverified
4SA-Net on Albert (ensemble)EM90.72Unverified
5SA-Net-V2 (ensemble)EM90.68Unverified
6FPNet (ensemble)EM90.6Unverified
7Retro-Reader (ensemble)EM90.58Unverified
8EntitySpanFocusV2 (ensemble)EM90.52Unverified
9TransNets + SFVerifier + SFEnsembler (ensemble)EM90.49Unverified
10EntitySpanFocus+AT (ensemble)EM90.45Unverified