SOTAVerified

Question Answering

Question answering can be segmented into domain-specific tasks like community question answering and knowledge-base question answering. Popular benchmark datasets for evaluation question answering systems include SQuAD, HotPotQA, bAbI, TriviaQA, WikiQA, and many others. Models for question answering are typically evaluated on metrics like EM and F1. Some recent top performing models are T5 and XLNet.

( Image credit: SQuAD )

Papers

Showing 13511400 of 10817 papers

TitleStatusHype
Multimodality Representation Learning: A Survey on Evolution, Pretraining and Its ApplicationsCode1
PADL: Language-Directed Physics-Based Character ControlCode1
Can an AI Win Ghana's National Science and Maths Quiz? An AI Grand Challenge for EducationCode1
Semantic Parsing for Conversational Question Answering over Knowledge GraphsCode1
A Comparative Study of Pretrained Language Models for Long Clinical TextCode1
ViDeBERTa: A powerful pre-trained language model for VietnameseCode1
SlideVQA: A Dataset for Document Visual Question Answering on Multiple ImagesCode1
Multimodal Inverse Cloze Task for Knowledge-based Visual Question AnsweringCode1
Mind Reasoning Manners: Enhancing Type Perception for Generalized Zero-shot Logical Reasoning over TextCode1
SPRING: Situated Conversation Agent Pretrained with Multimodal Questions from Incremental Layout GraphCode1
Context-Aware Alignment and Mutual Masking for 3D-Language Pre-TrainingCode1
VQACL: A Novel Visual Question Answering Continual Learning SettingCode1
Variational Causal Inference Network for Explanatory Visual Question AnsweringCode1
Rethinking with Retrieval: Faithful Large Language Model InferenceCode1
Large Language Models Encode Clinical KnowledgeCode1
OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of GeneralizationCode1
Parallel Context Windows for Large Language ModelsCode1
Are Deep Neural Networks SMARTer than Second Graders?Code1
Optimization Techniques for Unsupervised Complex Table Reasoning via Self-Training FrameworkCode1
Evaluating Human-Language Model InteractionCode1
Don't Generate, Discriminate: A Proposal for Grounding Language Models to Real-World EnvironmentsCode1
MIST: Multi-modal Iterative Spatial-Temporal Transformer for Long-form Video Question AnsweringCode1
Visconde: Multi-document QA with GPT-3 and Neural RerankingCode1
Can Retriever-Augmented Language Models Reason? The Blame Game Between the Retriever and the Language ModelCode1
Self-Prompting Large Language Models for Zero-Shot Open-Domain QACode1
Enhancing Multi-modal and Multi-hop Question Answering via Structured Knowledge and Unified Retrieval-GenerationCode1
Attributed Question Answering: Evaluation and Modeling for Attributed Large Language ModelsCode1
APOLLO: An Optimized Training Approach for Long-form Numerical ReasoningCode1
VindLU: A Recipe for Effective Video-and-Language PretrainingCode1
Hierarchical multimodal transformers for Multi-Page DocVQACode1
Retrieval as Attention: End-to-end Learning of Retrieval and Reading within a Single TransformerCode1
UniKGQA: Unified Retrieval and Reasoning for Solving Multi-hop Question Answering Over Knowledge GraphCode1
Nonparametric Masked Language ModelingCode1
Relation-Aware Language-Graph Transformer for Question AnsweringCode1
A Sequential Flow Control Framework for Multi-hop Knowledge Base Question AnsweringCode1
Super-CLEVR: A Virtual Benchmark to Diagnose Domain Robustness in Visual ReasoningCode1
AIONER: All-in-one scheme-based biomedical named entity recognition using deep learningCode1
CREPE: Open-Domain Question Answering with False PresuppositionsCode1
Frustratingly Easy Label Projection for Cross-lingual TransferCode1
Self-supervised vision-language pretraining for Medical visual question answeringCode1
Seeing What You Miss: Vision-Language Pre-training with Semantic Completion LearningCode1
Hengam: An Adversarially Trained Transformer for Persian Temporal TaggingCode1
Visual Commonsense-aware Representation Network for Video CaptioningCode1
I Can't Believe There's No Images! Learning Visual Tasks Using only Language SupervisionCode1
MapQA: A Dataset for Question Answering on Choropleth MapsCode1
QAmeleon: Multilingual QA with Only 5 ExamplesCode1
PromptCap: Prompt-Guided Task-Aware Image CaptioningCode1
Large Language Models Struggle to Learn Long-Tail KnowledgeCode1
Retrieval-Augmented Generative Question Answering for Event Argument ExtractionCode1
Mining Mathematical Documents for Question Answering via Unsupervised Formula LabelingCode1
Show:102550
← PrevPage 28 of 217Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1IE-Net (ensemble)EM90.94Unverified
2FPNet (ensemble)EM90.87Unverified
3IE-NetV2 (ensemble)EM90.86Unverified
4SA-Net on Albert (ensemble)EM90.72Unverified
5SA-Net-V2 (ensemble)EM90.68Unverified
6FPNet (ensemble)EM90.6Unverified
7Retro-Reader (ensemble)EM90.58Unverified
8EntitySpanFocusV2 (ensemble)EM90.52Unverified
9TransNets + SFVerifier + SFEnsembler (ensemble)EM90.49Unverified
10EntitySpanFocus+AT (ensemble)EM90.45Unverified