SOTAVerified

Question Answering

Question answering can be segmented into domain-specific tasks like community question answering and knowledge-base question answering. Popular benchmark datasets for evaluation question answering systems include SQuAD, HotPotQA, bAbI, TriviaQA, WikiQA, and many others. Models for question answering are typically evaluated on metrics like EM and F1. Some recent top performing models are T5 and XLNet.

( Image credit: SQuAD )

Papers

Showing 93019350 of 10817 papers

TitleStatusHype
Quasar: Datasets for Question Answering by Search and ReadingCode0
M-QALM: A Benchmark to Assess Clinical Reading Comprehension and Knowledge Recall in Large Language Models via Question AnsweringCode0
MQDD: Pre-training of Multimodal Question Duplicity Detection for Software Engineering DomainCode0
CounQER: A System for Discovering and Linking Count Information in Knowledge BasesCode0
ParaQA: A Question Answering Dataset with Paraphrase Responses for Single-Turn ConversationCode0
Katecheo: A Portable and Modular System for Multi-Topic Question AnsweringCode0
KazQAD: Kazakh Open-Domain Question Answering DatasetCode0
KBAlign: Efficient Self Adaptation on Specific Knowledge BasesCode0
An Empirical Study of Pre-trained Language Models in Simple Knowledge Graph Question AnsweringCode0
HaVQA: A Dataset for Visual Question Answering and Multimodal Research in Hausa LanguageCode0
ParaShoot: A Hebrew Question Answering DatasetCode0
Harnessing the Power of Semi-Structured Knowledge and LLMs with Triplet-Based Prefiltering for Question AnsweringCode0
K-COMP: Retrieval-Augmented Medical Domain Question Answering With Knowledge-Injected CompressorCode0
PARMA: A Predicate Argument AlignerCode0
Parser Extraction of Triples in Unstructured TextCode0
COSY: COunterfactual SYntax for Cross-Lingual UnderstandingCode0
Harnessing the Power of Prompt-based Techniques for Generating School-Level Questions using Large Language ModelsCode0
Rethinking the Objectives of Extractive Question AnsweringCode0
Keep Security! Benchmarking Security Policy Preservation in Large Language Model Contexts Against Indirect Attacks in Question AnsweringCode0
Handling Ontology Gaps in Semantic ParsingCode0
Quebec Automobile Insurance Question-Answering With Retrieval-Augmented GenerationCode0
BiblioPage: A Dataset of Scanned Title Pages for Bibliographic Metadata ExtractionCode0
MRQA 2019 Shared Task: Evaluating Generalization in Reading ComprehensionCode0
Query and Attention Augmentation for Knowledge-Based Explainable ReasoningCode0
KEPR: Knowledge Enhancement and Plausibility Ranking for Generative Commonsense Question AnsweringCode0
MSG-Chart: Multimodal Scene Graph for ChartQACode0
COSMO: Conditional SEQ2SEQ-based Mixture Model for Zero-Shot Commonsense Question AnsweringCode0
MST5 -- Multilingual Question Answering over Knowledge GraphsCode0
KERS: A Knowledge-Enhanced Framework for Recommendation Dialog Systems with Multiple SubgoalsCode0
HaluEval-Wild: Evaluating Hallucinations of Language Models in the WildCode0
CoSaTa: A Constraint Satisfaction Solver and Interpreted Language for Semi-Structured Tables of SentencesCode0
HALO: Hallucination Analysis and Learning Optimization to Empower LLMs with Retrieval-Augmented Context for Guided Clinical Decision MakingCode0
Correct after Answer: Enhancing Multi-Span Question Answering with Post-Processing MethodCode0
Query-based Attention CNN for Text Similarity MapCode0
Core Tokensets for Data-efficient Sequential Training of TransformersCode0
RUBi: Reducing Unimodal Biases for Visual Question AnsweringCode0
Key-Value Memory Networks for Directly Reading DocumentsCode0
CORE-GPT: Combining Open Access research and large language models for credible, trustworthy question answeringCode0
Query Enhanced Knowledge-Intensive Conversation via Unsupervised Joint ModelingCode0
Bias patterns in the application of LLMs for clinical decision support: A comprehensive studyCode0
HALLUCINOGEN: A Benchmark for Evaluating Object Hallucination in Large Visual-Language ModelsCode0
A Knowledge-Grounded Multimodal Search-Based Conversational AgentCode0
Coreference Reasoning in Machine Reading ComprehensionCode0
Hallucination Mitigation Prompts Long-term Video UnderstandingCode0
CoQA: A Conversational Question Answering ChallengeCode0
Passage-specific Prompt Tuning for Passage Reranking in Question Answering with Large Language ModelsCode0
Hallucination Benchmark in Medical Visual Question AnsweringCode0
MuCoT: Multilingual Contrastive Training for Question-Answering in Low-resource LanguagesCode0
A Joint Sequence Fusion Model for Video Question Answering and RetrievalCode0
MuDAF: Long-Context Multi-Document Attention Focusing through Contrastive Learning on Attention HeadsCode0
Show:102550
← PrevPage 187 of 217Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1IE-Net (ensemble)EM90.94Unverified
2FPNet (ensemble)EM90.87Unverified
3IE-NetV2 (ensemble)EM90.86Unverified
4SA-Net on Albert (ensemble)EM90.72Unverified
5SA-Net-V2 (ensemble)EM90.68Unverified
6FPNet (ensemble)EM90.6Unverified
7Retro-Reader (ensemble)EM90.58Unverified
8EntitySpanFocusV2 (ensemble)EM90.52Unverified
9TransNets + SFVerifier + SFEnsembler (ensemble)EM90.49Unverified
10EntitySpanFocus+AT (ensemble)EM90.45Unverified