SOTAVerified

Question Answering

Question answering can be segmented into domain-specific tasks like community question answering and knowledge-base question answering. Popular benchmark datasets for evaluation question answering systems include SQuAD, HotPotQA, bAbI, TriviaQA, WikiQA, and many others. Models for question answering are typically evaluated on metrics like EM and F1. Some recent top performing models are T5 and XLNet.

( Image credit: SQuAD )

Papers

Showing 16311640 of 10817 papers

TitleStatusHype
FinQAPT: Empowering Financial Decisions with End-to-End LLM-driven Question Answering Pipeline0
A Little Human Data Goes A Long WayCode0
Measuring Free-Form Decision-Making Inconsistency of Language Models in Military Crisis SimulationsCode0
BQA: Body Language Question Answering Dataset for Video Large Language Models0
Help Me Identify: Is an LLM+VQA System All We Need to Identify Visual Concepts?Code0
RAP: Retrieval-Augmented Personalization for Multimodal Large Language ModelsCode2
Evaluating Self-Generated Documents for Enhancing Retrieval-Augmented Generation with Large Language Models0
Advancing Large Language Model Attribution through Self-Improving0
RescueADI: Adaptive Disaster Interpretation in Remote Sensing Images with Autonomous Agents0
AdaSwitch: Adaptive Switching between Small and Large Agents for Effective Cloud-Local Collaborative Learning0
Show:102550
← PrevPage 164 of 1082Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1IE-Net (ensemble)EM90.94Unverified
2FPNet (ensemble)EM90.87Unverified
3IE-NetV2 (ensemble)EM90.86Unverified
4SA-Net on Albert (ensemble)EM90.72Unverified
5SA-Net-V2 (ensemble)EM90.68Unverified
6FPNet (ensemble)EM90.6Unverified
7Retro-Reader (ensemble)EM90.58Unverified
8EntitySpanFocusV2 (ensemble)EM90.52Unverified
9TransNets + SFVerifier + SFEnsembler (ensemble)EM90.49Unverified
10EntitySpanFocus+AT (ensemble)EM90.45Unverified