SOTAVerified

Question Answering

Question answering can be segmented into domain-specific tasks like community question answering and knowledge-base question answering. Popular benchmark datasets for evaluation question answering systems include SQuAD, HotPotQA, bAbI, TriviaQA, WikiQA, and many others. Models for question answering are typically evaluated on metrics like EM and F1. Some recent top performing models are T5 and XLNet.

( Image credit: SQuAD )

Papers

Showing 26912700 of 10817 papers

TitleStatusHype
A Probabilistic Annotation Model for Crowdsourcing Coreference0
AIA-BDE: A Corpus of FAQs in Portuguese and their Variations0
Actively Seeking and Learning from Live Data0
Efficient Bilinear Attention-based Fusion for Medical Visual Question Answering0
Can Multimodal LLMs do Visual Temporal Understanding and Reasoning? The answer is No!0
A Pretraining Numerical Reasoning Model for Ordinal Constrained Question Answering on Knowledge Base0
Can LLMs Generate Human-Like Wayfinding Instructions? Towards Platform-Agnostic Embodied Instruction Synthesis0
Can LLMs assist with Ambiguity? A Quantitative Evaluation of various Large Language Models on Word Sense Disambiguation0
A Preliminary Study of o1 in Medicine: Are We Closer to an AI Doctor?0
Can Large Language Models Unveil the Mysteries? An Exploration of Their Ability to Unlock Information in Complex Scenarios0
Show:102550
← PrevPage 270 of 1082Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1IE-Net (ensemble)EM90.94Unverified
2FPNet (ensemble)EM90.87Unverified
3IE-NetV2 (ensemble)EM90.86Unverified
4SA-Net on Albert (ensemble)EM90.72Unverified
5SA-Net-V2 (ensemble)EM90.68Unverified
6FPNet (ensemble)EM90.6Unverified
7Retro-Reader (ensemble)EM90.58Unverified
8EntitySpanFocusV2 (ensemble)EM90.52Unverified
9TransNets + SFVerifier + SFEnsembler (ensemble)EM90.49Unverified
10EntitySpanFocus+AT (ensemble)EM90.45Unverified