SOTAVerified

Question Answering

Question answering can be segmented into domain-specific tasks like community question answering and knowledge-base question answering. Popular benchmark datasets for evaluation question answering systems include SQuAD, HotPotQA, bAbI, TriviaQA, WikiQA, and many others. Models for question answering are typically evaluated on metrics like EM and F1. Some recent top performing models are T5 and XLNet.

( Image credit: SQuAD )

Papers

Showing 27112720 of 10817 papers

TitleStatusHype
Tool Calling: Enhancing Medication Consultation via Retrieval-Augmented Large Language Models0
Transfer Learning Enhanced Single-choice Decision for Multi-choice Question Answering0
MediFact at MEDIQA-CORR 2024: Why AI Needs a Human TouchCode0
Continual Pre-Training for Cross-Lingual LLM Adaptation: Enhancing Japanese Language Capabilities0
Retrieval-Augmented Generation with Knowledge Graphs for Customer Service Question Answering0
From Multiple-Choice to Extractive QA: A Case Study for English and ArabicCode0
MovieChat+: Question-aware Sparse Memory for Long Video Question AnsweringCode4
2M-NER: Contrastive Learning for Multilingual and Multimodal NER with Language and Modal Fusion0
TIGQA:An Expert Annotated Question Answering Dataset in Tigrinya0
Large Language Models in the Clinic: A Comprehensive BenchmarkCode1
Show:102550
← PrevPage 272 of 1082Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1IE-Net (ensemble)EM90.94Unverified
2FPNet (ensemble)EM90.87Unverified
3IE-NetV2 (ensemble)EM90.86Unverified
4SA-Net on Albert (ensemble)EM90.72Unverified
5SA-Net-V2 (ensemble)EM90.68Unverified
6FPNet (ensemble)EM90.6Unverified
7Retro-Reader (ensemble)EM90.58Unverified
8EntitySpanFocusV2 (ensemble)EM90.52Unverified
9TransNets + SFVerifier + SFEnsembler (ensemble)EM90.49Unverified
10EntitySpanFocus+AT (ensemble)EM90.45Unverified