SOTAVerified

Question Answering

Question answering can be segmented into domain-specific tasks like community question answering and knowledge-base question answering. Popular benchmark datasets for evaluation question answering systems include SQuAD, HotPotQA, bAbI, TriviaQA, WikiQA, and many others. Models for question answering are typically evaluated on metrics like EM and F1. Some recent top performing models are T5 and XLNet.

( Image credit: SQuAD )

Papers

Showing 16511675 of 10817 papers

TitleStatusHype
ChestX-Reasoner: Advancing Radiology Foundation Models with Reasoning through Step-by-Step VerificationCode1
Multimodal Federated Learning via Contrastive Representation EnsembleCode1
Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene TextCode1
Multimodal Inverse Cloze Task for Knowledge-based Visual Question AnsweringCode1
ChiMed-GPT: A Chinese Medical Large Language Model with Full Training Regime and Better Alignment to Human PreferencesCode1
Multimodal Prompt Retrieval for Generative Visual Question AnsweringCode1
AntifakePrompt: Prompt-Tuned Vision-Language Models are Fake Image DetectorsCode1
DEXTER: A Benchmark for open-domain Complex Question Answering using LLMsCode1
Multi-Partition Embedding Interaction with Block Term Format for Knowledge Graph CompletionCode1
Building Efficient and Effective OpenQA Systems for Low-Resource LanguagesCode1
Multi-Relational Embedding for Knowledge Graph Representation and AnalysisCode1
MultiReQA: A Cross-Domain Evaluation forRetrieval Question Answering ModelsCode1
Dialog Inpainting: Turning Documents into DialogsCode1
Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning StrategiesCode1
Detecting Hate Speech in Multi-modal MemesCode1
nach0: Multimodal Natural and Chemical Languages Foundation ModelCode1
Development and bilingual evaluation of Japanese medical large language model within reasonably low computational resourcesCode1
Natural Language Rationales with Full-Stack Visual Reasoning: From Pixels to Semantic Frames to Commonsense GraphsCode1
A Survey of Medical Vision-and-Language Applications and Their TechniquesCode1
kNN-Prompt: Nearest Neighbor Zero-Shot InferenceCode1
Neural Mask Generator: Learning to Generate Adaptive Word Maskings for Language Model AdaptationCode1
ByT5: Towards a token-free future with pre-trained byte-to-byte modelsCode1
Detecting and Preventing Hallucinations in Large Vision Language ModelsCode1
CABINET: Content Relevance based Noise Reduction for Table Question AnsweringCode1
DeVLBert: Learning Deconfounded Visio-Linguistic RepresentationsCode1
Show:102550
← PrevPage 67 of 433Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1IE-Net (ensemble)EM90.94Unverified
2FPNet (ensemble)EM90.87Unverified
3IE-NetV2 (ensemble)EM90.86Unverified
4SA-Net on Albert (ensemble)EM90.72Unverified
5SA-Net-V2 (ensemble)EM90.68Unverified
6FPNet (ensemble)EM90.6Unverified
7Retro-Reader (ensemble)EM90.58Unverified
8EntitySpanFocusV2 (ensemble)EM90.52Unverified
9TransNets + SFVerifier + SFEnsembler (ensemble)EM90.49Unverified
10EntitySpanFocus+AT (ensemble)EM90.45Unverified