SOTAVerified

Question Answering

Question answering can be segmented into domain-specific tasks like community question answering and knowledge-base question answering. Popular benchmark datasets for evaluation question answering systems include SQuAD, HotPotQA, bAbI, TriviaQA, WikiQA, and many others. Models for question answering are typically evaluated on metrics like EM and F1. Some recent top performing models are T5 and XLNet.

( Image credit: SQuAD )

Papers

Showing 1065110675 of 10817 papers

TitleStatusHype
Think Visually: Question Answering through Virtual ImageryCode0
Symbolic Priors for RNN-based Semantic ParsingCode0
SyllabusQA: A Course Logistics Question Answering DatasetCode0
Unifying Text, Tables, and Images for Multimodal Question AnsweringCode0
Think Twice: Measuring the Efficiency of Eliminating Prediction Shortcuts of Question Answering ModelsCode0
SwissAlps at SemEval-2017 Task 3: Attention-based Convolutional Neural Network for Community Question AnsweringCode0
Think before You Simulate: Symbolic Reasoning to Orchestrate Neural Computation for Counterfactual Question AnsweringCode0
X-GGM: Graph Generative Modeling for Out-of-Distribution Generalization in Visual Question AnsweringCode0
Thieves on Sesame Street! Model Extraction of BERT-based APIsCode0
SWI: Speaking with Intent in Large Language ModelsCode0
UNIMELB at SemEval-2016 Tasks 4A and 4B: An Ensemble of Neural Networks and a Word2Vec Based Model for Sentiment ClassificationCode0
Simple Applications of BERT for Ad Hoc Document RetrievalCode0
SURE-VQA: Systematic Understanding of Robustness Evaluation in Medical VQA TasksCode0
Simple and Effective Text Matching with Richer Alignment FeaturesCode0
They Exist! Introducing Plural Mentions to Coreference Resolution and Entity LinkingCode0
Supervised Knowledge Makes Large Language Models Better In-context LearnersCode0
Self Question-answering: Aspect-based Sentiment Analysis by Role Flipped Machine Reading ComprehensionCode0
UniPSDA: Unsupervised Pseudo Semantic Data Augmentation for Zero-Shot Cross-Lingual Natural Language UnderstandingCode0
The TechQA DatasetCode0
The Shmoop Corpus: A Dataset of Stories with Loosely Aligned SummariesCode0
The Role of Output Vocabulary in T2T LMs for SPARQL Semantic ParsingCode0
The representation landscape of few-shot learning and fine-tuning in large language modelsCode0
UniRS: Unifying Multi-temporal Remote Sensing Tasks through Vision Language ModelsCode0
SemEval-2019 Task 10: Math Question AnsweringCode0
WebQAmGaze: A Multilingual Webcam Eye-Tracking-While-Reading DatasetCode0
Show:102550
← PrevPage 427 of 433Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1IE-Net (ensemble)EM90.94Unverified
2FPNet (ensemble)EM90.87Unverified
3IE-NetV2 (ensemble)EM90.86Unverified
4SA-Net on Albert (ensemble)EM90.72Unverified
5SA-Net-V2 (ensemble)EM90.68Unverified
6FPNet (ensemble)EM90.6Unverified
7Retro-Reader (ensemble)EM90.58Unverified
8EntitySpanFocusV2 (ensemble)EM90.52Unverified
9TransNets + SFVerifier + SFEnsembler (ensemble)EM90.49Unverified
10EntitySpanFocus+AT (ensemble)EM90.45Unverified