SOTAVerified

Question Answering

Question answering can be segmented into domain-specific tasks like community question answering and knowledge-base question answering. Popular benchmark datasets for evaluation question answering systems include SQuAD, HotPotQA, bAbI, TriviaQA, WikiQA, and many others. Models for question answering are typically evaluated on metrics like EM and F1. Some recent top performing models are T5 and XLNet.

( Image credit: SQuAD )

Papers

Showing 96269650 of 10817 papers

TitleStatusHype
Compositional Image-Text Matching and Retrieval by Grounding EntitiesCode0
Analyzing Sustainability Reports Using Natural Language ProcessingCode0
Findings of the VarDial Evaluation Campaign 2022Code0
Finding Generalizable Evidence by Learning to Convince Q&A ModelsCode0
FIBER: Fill-in-the-Blanks as a Challenging Video Understanding Evaluation FrameworkCode0
Question Answering as an Automatic Evaluation Metric for News Article SummarizationCode0
Complex Sequential Question Answering: Towards Learning to Converse Over Linked Question Answer Pairs with a Knowledge GraphCode0
Filling the Image Information Gap for VQA: Prompting Large Language Models to Proactively Ask QuestionsCode0
Multi-Scale Heterogeneous Text-Attributed Graph Datasets From Diverse DomainsCode0
FFCI: A Framework for Interpretable Automatic Evaluation of SummarizationCode0
LayoutLMv3: Pre-training for Document AI with Unified Text and Image MaskingCode0
Benchmarking Multimodal RAG through a Chart-based Document Question-Answering Generation FrameworkCode0
Benchmarking Long-tail Generalization with Likelihood SplitsCode0
Multi-Sourced Compositional Generalization in Visual Question AnsweringCode0
Benchmarking LLM-based Relevance Judgment MethodsCode0
Benchmarking Hallucination in Large Language Models based on Unanswerable Math Word ProblemCode0
Few-Shot Upsampling for Protest Size DetectionCode0
Multi-Source Test-Time Adaptation as Dueling Bandits for Extractive Question AnsweringCode0
Comparing Humans and Models on a Similar Scale: Towards Cognitive Gender Bias Evaluation in Coreference ResolutionCode0
Question Answering through Transfer Learning from Large Fine-grained Supervision DataCode0
Leap-LSTM: Enhancing Long Short-Term Memory for Text CategorizationCode0
Few-Shot Multimodal Explanation for Visual Question AnsweringCode0
Few-Shot Multilingual Open-Domain QA from 5 ExamplesCode0
Learned in Translation: Contextualized Word VectorsCode0
Learn from Downstream and Be Yourself in Multimodal Large Language Model Fine-TuningCode0
Show:102550
← PrevPage 386 of 433Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1IE-Net (ensemble)EM90.94Unverified
2FPNet (ensemble)EM90.87Unverified
3IE-NetV2 (ensemble)EM90.86Unverified
4SA-Net on Albert (ensemble)EM90.72Unverified
5SA-Net-V2 (ensemble)EM90.68Unverified
6FPNet (ensemble)EM90.6Unverified
7Retro-Reader (ensemble)EM90.58Unverified
8EntitySpanFocusV2 (ensemble)EM90.52Unverified
9TransNets + SFVerifier + SFEnsembler (ensemble)EM90.49Unverified
10EntitySpanFocus+AT (ensemble)EM90.45Unverified