SOTAVerified

Question Answering

Question answering can be segmented into domain-specific tasks like community question answering and knowledge-base question answering. Popular benchmark datasets for evaluation question answering systems include SQuAD, HotPotQA, bAbI, TriviaQA, WikiQA, and many others. Models for question answering are typically evaluated on metrics like EM and F1. Some recent top performing models are T5 and XLNet.

( Image credit: SQuAD )

Papers

Showing 13511375 of 10817 papers

TitleStatusHype
Multimodality Representation Learning: A Survey on Evolution, Pretraining and Its ApplicationsCode1
PADL: Language-Directed Physics-Based Character ControlCode1
Can an AI Win Ghana's National Science and Maths Quiz? An AI Grand Challenge for EducationCode1
Semantic Parsing for Conversational Question Answering over Knowledge GraphsCode1
A Comparative Study of Pretrained Language Models for Long Clinical TextCode1
ViDeBERTa: A powerful pre-trained language model for VietnameseCode1
SlideVQA: A Dataset for Document Visual Question Answering on Multiple ImagesCode1
Multimodal Inverse Cloze Task for Knowledge-based Visual Question AnsweringCode1
Mind Reasoning Manners: Enhancing Type Perception for Generalized Zero-shot Logical Reasoning over TextCode1
SPRING: Situated Conversation Agent Pretrained with Multimodal Questions from Incremental Layout GraphCode1
Variational Causal Inference Network for Explanatory Visual Question AnsweringCode1
VQACL: A Novel Visual Question Answering Continual Learning SettingCode1
Context-Aware Alignment and Mutual Masking for 3D-Language Pre-TrainingCode1
Rethinking with Retrieval: Faithful Large Language Model InferenceCode1
Large Language Models Encode Clinical KnowledgeCode1
OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of GeneralizationCode1
Parallel Context Windows for Large Language ModelsCode1
Optimization Techniques for Unsupervised Complex Table Reasoning via Self-Training FrameworkCode1
Are Deep Neural Networks SMARTer than Second Graders?Code1
MIST: Multi-modal Iterative Spatial-Temporal Transformer for Long-form Video Question AnsweringCode1
Visconde: Multi-document QA with GPT-3 and Neural RerankingCode1
Don't Generate, Discriminate: A Proposal for Grounding Language Models to Real-World EnvironmentsCode1
Evaluating Human-Language Model InteractionCode1
Can Retriever-Augmented Language Models Reason? The Blame Game Between the Retriever and the Language ModelCode1
Self-Prompting Large Language Models for Zero-Shot Open-Domain QACode1
Show:102550
← PrevPage 55 of 433Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1IE-Net (ensemble)EM90.94Unverified
2FPNet (ensemble)EM90.87Unverified
3IE-NetV2 (ensemble)EM90.86Unverified
4SA-Net on Albert (ensemble)EM90.72Unverified
5SA-Net-V2 (ensemble)EM90.68Unverified
6FPNet (ensemble)EM90.6Unverified
7Retro-Reader (ensemble)EM90.58Unverified
8EntitySpanFocusV2 (ensemble)EM90.52Unverified
9TransNets + SFVerifier + SFEnsembler (ensemble)EM90.49Unverified
10EntitySpanFocus+AT (ensemble)EM90.45Unverified