SOTAVerified

Question Answering

Question answering can be segmented into domain-specific tasks like community question answering and knowledge-base question answering. Popular benchmark datasets for evaluation question answering systems include SQuAD, HotPotQA, bAbI, TriviaQA, WikiQA, and many others. Models for question answering are typically evaluated on metrics like EM and F1. Some recent top performing models are T5 and XLNet.

( Image credit: SQuAD )

Papers

Showing 1000110025 of 10817 papers

TitleStatusHype
NOAHQA: Numerical Reasoning with Interpretable Graph Question Answering DatasetCode0
No Images, No Problem: Retaining Knowledge in Continual VQA with Questions-Only MemoryCode0
Relation Extraction with Instance-Adapted Predicate DescriptionsCode0
Noise Estimation Using Density Estimation for Self-Supervised Multimodal LearningCode0
EQA-RM: A Generative Embodied Reward Model with Test-time ScalingCode0
ClinKD: Cross-Modal Clinical Knowledge Distiller For Multi-Task Medical ImagesCode0
Decomposed Prompting to Answer Questions on a Course Discussion BoardCode0
AttenWalker: Unsupervised Long-Document Question Answering via Attention-based Graph WalkingCode0
Episodic Memory Reader: Learning What to Remember for Question Answering from Streaming DataCode0
No Length Left Behind: Enhancing Knowledge Tracing for Modeling Sequences of Excessive or Insufficient LengthsCode0
AQA: Adaptive Question Answering in a Society of LLMs via Contextual Multi-Armed BanditCode0
Episodic Memory in Lifelong Language LearningCode0
Review-guided Helpful Answer Identification in E-commerceCode0
LLM Robustness Against Misinformation in Biomedical Question AnsweringCode0
Sentence Embeddings for Russian NLUCode0
EpiK-Eval: Evaluation for Language Models as Epistemic ModelsCode0
Probing the Geometry of Truth: Consistency and Generalization of Truth Directions in LLMs Across Logical Transformations and Question Answering TasksCode0
Entropy-Based Decoding for Retrieval-Augmented Large Language ModelsCode0
Active Learning to Guide Labeling Efforts for Question Difficulty EstimationCode0
Climate Finance BenchCode0
LLM-SQL-Solver: Can LLMs Determine SQL Equivalence?Code0
ELOQ: Resources for Enhancing LLM Detection of Out-of-Scope QuestionsCode0
No One is Perfect: Analysing the Performance of Question Answering Components over the DBpedia Knowledge GraphCode0
Entity-Relation Extraction as Multi-Turn Question AnsweringCode0
EntGPT: Linking Generative Large Language Models with Knowledge BasesCode0
Show:102550
← PrevPage 401 of 433Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1IE-Net (ensemble)EM90.94Unverified
2FPNet (ensemble)EM90.87Unverified
3IE-NetV2 (ensemble)EM90.86Unverified
4SA-Net on Albert (ensemble)EM90.72Unverified
5SA-Net-V2 (ensemble)EM90.68Unverified
6FPNet (ensemble)EM90.6Unverified
7Retro-Reader (ensemble)EM90.58Unverified
8EntitySpanFocusV2 (ensemble)EM90.52Unverified
9TransNets + SFVerifier + SFEnsembler (ensemble)EM90.49Unverified
10EntitySpanFocus+AT (ensemble)EM90.45Unverified