SOTAVerified

Question Answering

Question answering can be segmented into domain-specific tasks like community question answering and knowledge-base question answering. Popular benchmark datasets for evaluation question answering systems include SQuAD, HotPotQA, bAbI, TriviaQA, WikiQA, and many others. Models for question answering are typically evaluated on metrics like EM and F1. Some recent top performing models are T5 and XLNet.

( Image credit: SQuAD )

Papers

Showing 18511900 of 10817 papers

TitleStatusHype
Multi-Step Reasoning Over Unstructured Text with Beam Dense RetrievalCode1
SpartQA: : A Textual Question Answering Benchmark for Spatial ReasoningCode1
Adapting Language Models for Zero-shot Learning by Meta-tuning on Dataset and Prompt CollectionsCode1
Beyond Question-Based Biases: Assessing Multimodal Shortcut Learning in Visual Question AnsweringCode1
CBench: Towards Better Evaluation of Question Answering Over Knowledge GraphsCode1
Conversational Question Answering over Knowledge Graphs with Transformer and Graph Attention NetworksCode1
MMBERT: Multimodal BERT Pretraining for Improved Medical VQACode1
VisQA: X-raying Vision and Language Reasoning in TransformersCode1
MultiReQA: A Cross-Domain Evaluation forRetrieval Question Answering ModelsCode1
FeTaQA: Free-form Table Question AnsweringCode1
NLQuAD: A Non-Factoid Long Question Answering Data SetCode1
Are Bias Mitigation Techniques for Deep Learning Effective?Code1
Towards General Purpose Vision SystemsCode1
Automatically Generating Cause-and-Effect Questions from PassagesCode1
Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder TransformersCode1
SUTD-TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning over Traffic EventsCode1
A Comprehensive Review of the Video-to-Text ProblemCode1
On the hidden treasure of dialog in video question answeringCode1
UNICORN on RAINBOW: A Universal Commonsense Reasoning Model on a New Multitask BenchmarkCode1
QuestEval: Summarization Asks for Fact-based EvaluationCode1
Multi-Modal Answer Validation for Knowledge-Based VQACode1
Controllable Generation from Pre-trained Language Models via Inverse PromptingCode1
Cooperative Self-training of Machine Reading ComprehensionCode1
Knowledge Graph Question Answering using Graph-Pattern IsomorphismCode1
Hurdles to Progress in Long-form Question AnsweringCode1
AnswerQuest: A System for Generating Question-Answer Items from Multi-Paragraph DocumentsCode1
Logic Embeddings for Complex Query AnsweringCode1
Less is More: Pre-train a Strong Text Encoder for Dense Retrieval Using a Weak DecoderCode1
SLAKE: A Semantically-Labeled Knowledge-Enhanced Dataset for Medical Visual Question AnsweringCode1
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual ConceptsCode1
NoiseQA: Challenge Set Evaluation for User-Centric Question AnsweringCode1
PAQ: 65 Million Probably-Asked Questions and What You Can Do With ThemCode1
Less is More: ClipBERT for Video-and-Language Learning via Sparse SamplingCode1
Nyströmformer: A Nyström-Based Algorithm for Approximating Self-AttentionCode1
Unifying Vision-and-Language Tasks via Text GenerationCode1
ChainCQG: Flow-Aware Conversational Question GenerationCode1
[Re] Improving Multi-hop Question Answering over Knowledge Graphs using Knowledge Base EmbeddingsCode1
VisualMRC: Machine Reading Comprehension on Document ImagesCode1
Mitigating the Position Bias of Transformer Models in Passage Re-RankingCode1
Match-Ignition: Plugging PageRank into Transformer for Long-form Text MatchingCode1
ComQA:Compositional Question Answering via Hierarchical Graph Neural NetworksCode1
TSQA: Tabular Scenario Based Question AnsweringCode1
Improving Multi-hop Knowledge Base Question Answering by Learning Intermediate Supervision SignalsCode1
SF-QA: Simple and Fair Evaluation Library for Open-domain Question AnsweringCode1
Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning StrategiesCode1
Personalized Food Recommendation as Constrained Question Answering over a Large-scale Food Knowledge GraphCode1
End-to-End Training of Neural Retrievers for Open-Domain Question AnsweringCode1
Few-Shot Question Answering by Pretraining Span SelectionCode1
CDLM: Cross-Document Language ModelingCode1
Pano-AVQA: Grounded Audio-Visual Question Answering on 360deg VideosCode1
Show:102550
← PrevPage 38 of 217Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1IE-Net (ensemble)EM90.94Unverified
2FPNet (ensemble)EM90.87Unverified
3IE-NetV2 (ensemble)EM90.86Unverified
4SA-Net on Albert (ensemble)EM90.72Unverified
5SA-Net-V2 (ensemble)EM90.68Unverified
6FPNet (ensemble)EM90.6Unverified
7Retro-Reader (ensemble)EM90.58Unverified
8EntitySpanFocusV2 (ensemble)EM90.52Unverified
9TransNets + SFVerifier + SFEnsembler (ensemble)EM90.49Unverified
10EntitySpanFocus+AT (ensemble)EM90.45Unverified