SOTAVerified

Question Answering

Question answering can be segmented into domain-specific tasks like community question answering and knowledge-base question answering. Popular benchmark datasets for evaluation question answering systems include SQuAD, HotPotQA, bAbI, TriviaQA, WikiQA, and many others. Models for question answering are typically evaluated on metrics like EM and F1. Some recent top performing models are T5 and XLNet.

( Image credit: SQuAD )

Papers

Showing 15011525 of 10817 papers

TitleStatusHype
EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the BackboneCode1
IndicSQuAD: A Comprehensive Multilingual Question Answering Dataset for Indic LanguagesCode1
InfiBench: Evaluating the Question-Answering Capabilities of Code Large Language ModelsCode1
CLTR: An End-to-End, Transformer-Based System for Cell Level Table Retrieval and Table Question AnsweringCode1
BiMediX: Bilingual Medical Mixture of Experts LLMCode1
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than GeneratorsCode1
Increasing Model Capacity for Free: A Simple Strategy for Parameter Efficient Fine-tuningCode1
Eliminating Position Bias of Language Models: A Mechanistic ApproachCode1
BioBERT: a pre-trained biomedical language representation model for biomedical text miningCode1
BioBridge: Bridging Biomedical Foundation Models via Knowledge GraphsCode1
Empirical Study of Zero-Shot NER with ChatGPTCode1
BioELECTRA:Pretrained Biomedical text Encoder using DiscriminatorsCode1
Bioformer: an efficient transformer language model for biomedical text miningCode1
CommonsenseQA: A Question Answering Challenge Targeting Commonsense KnowledgeCode1
Clues Before Answers: Generation-Enhanced Multiple-Choice QACode1
MFC-Bench: Benchmarking Multimodal Fact-Checking with Large Vision-Language ModelsCode1
Emergence of Grounded Compositional Language in Multi-Agent PopulationsCode1
CREPE: Open-Domain Question Answering with False PresuppositionsCode1
In Defense of Grid Features for Visual Question AnsweringCode1
InfMLLM: A Unified Framework for Visual-Language TasksCode1
Insights into Alignment: Evaluating DPO and its Variants Across Multiple TasksCode1
Empower Large Language Model to Perform Better on Industrial Domain-Specific Question AnsweringCode1
IoT-LM: Large Multisensory Language Models for the Internet of ThingsCode1
Knowledge-Based Video Question Answering with Unsupervised Scene DescriptionsCode1
Large Language Models in the Clinic: A Comprehensive BenchmarkCode1
Show:102550
← PrevPage 61 of 433Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1IE-Net (ensemble)EM90.94Unverified
2FPNet (ensemble)EM90.87Unverified
3IE-NetV2 (ensemble)EM90.86Unverified
4SA-Net on Albert (ensemble)EM90.72Unverified
5SA-Net-V2 (ensemble)EM90.68Unverified
6FPNet (ensemble)EM90.6Unverified
7Retro-Reader (ensemble)EM90.58Unverified
8EntitySpanFocusV2 (ensemble)EM90.52Unverified
9TransNets + SFVerifier + SFEnsembler (ensemble)EM90.49Unverified
10EntitySpanFocus+AT (ensemble)EM90.45Unverified