SOTAVerified

Question Answering

Question answering can be segmented into domain-specific tasks like community question answering and knowledge-base question answering. Popular benchmark datasets for evaluation question answering systems include SQuAD, HotPotQA, bAbI, TriviaQA, WikiQA, and many others. Models for question answering are typically evaluated on metrics like EM and F1. Some recent top performing models are T5 and XLNet.

( Image credit: SQuAD )

Papers

Showing 19011925 of 10817 papers

TitleStatusHype
Check It Again: Progressive Visual Question Answering via Visual EntailmentCode1
Check It Again:Progressive Visual Question Answering via Visual EntailmentCode1
Designing a Minimal Retrieve-and-Read System for Open-Domain Question AnsweringCode1
Single-Pass Document Scanning for Question AnsweringCode1
Skipping Computations in Multimodal LLMsCode1
SLAKE: A Semantically-Labeled Knowledge-Enhanced Dataset for Medical Visual Question AnsweringCode1
ChestX-Reasoner: Advancing Radiology Foundation Models with Reasoning through Step-by-Step VerificationCode1
Detecting Hate Speech in Multi-modal MemesCode1
SMedBERT: A Knowledge-Enhanced Pre-trained Language Model with Structured Semantics for Medical Text MiningCode1
Soft Prompting for Unlearning in Large Language ModelsCode1
Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning StrategiesCode1
ConTEXTual Net: A Multimodal Vision-Language Model for Segmentation of PneumothoraxCode1
ChineseEcomQA: A Scalable E-commerce Concept Evaluation Benchmark for Large Language ModelsCode1
Sparse Continuous Distributions and Fenchel-Young LossesCode1
Dense-Caption Matching and Frame-Selection Gating for Temporal Localization in VideoQACode1
Dense and Aligned Captions (DAC) Promote Compositional Reasoning in VL ModelsCode1
Contextualized Sparse Representations for Real-Time Open-Domain Question AnsweringCode1
Dense Hierarchical Retrieval for Open-Domain Question AnsweringCode1
Latent Retrieval for Weakly Supervised Open Domain Question AnsweringCode1
ContraDoc: Understanding Self-Contradictions in Documents with Large Language ModelsCode1
ChiQA: A Large Scale Image-based Real-World Question Answering Dataset for Multi-Modal UnderstandingCode1
SPORTU: A Comprehensive Sports Understanding Benchmark for Multimodal Large Language ModelsCode1
DELIFT: Data Efficient Language model Instruction Fine TuningCode1
Delaying Interaction Layers in Transformer-based Encoders for Efficient Open Domain Question AnsweringCode1
Densely Connected Attention Propagation for Reading ComprehensionCode1
Show:102550
← PrevPage 77 of 433Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1IE-Net (ensemble)EM90.94Unverified
2FPNet (ensemble)EM90.87Unverified
3IE-NetV2 (ensemble)EM90.86Unverified
4SA-Net on Albert (ensemble)EM90.72Unverified
5SA-Net-V2 (ensemble)EM90.68Unverified
6FPNet (ensemble)EM90.6Unverified
7Retro-Reader (ensemble)EM90.58Unverified
8EntitySpanFocusV2 (ensemble)EM90.52Unverified
9TransNets + SFVerifier + SFEnsembler (ensemble)EM90.49Unverified
10EntitySpanFocus+AT (ensemble)EM90.45Unverified