SOTAVerified

Question Answering

Question answering can be segmented into domain-specific tasks like community question answering and knowledge-base question answering. Popular benchmark datasets for evaluation question answering systems include SQuAD, HotPotQA, bAbI, TriviaQA, WikiQA, and many others. Models for question answering are typically evaluated on metrics like EM and F1. Some recent top performing models are T5 and XLNet.

( Image credit: SQuAD )

Papers

Showing 1045110500 of 10817 papers

TitleStatusHype
The combination of context information to enhance simple question answering0
The Consensus Game: Language Model Generation via Equilibrium Search0
The Context-Dependent Additive Recurrent Neural Net0
The CUHK Discourse TreeBank for Chinese: Annotating Explicit Discourse Connectives for the Chinese TreeBank0
The curse of language biases in remote sensing VQA: the role of spatial attributes, language diversity, and the need for clear evaluation0
The Dangers of trusting Stochastic Parrots: Faithfulness and Trust in Open-domain Conversational Question Answering0
The DBOX Corpus Collection of Spoken Human-Human and Human-Machine Dialogues0
The Death of Feature Engineering? BERT with Linguistic Features on SQuAD 2.00
The Development of Multimodal Lexical Resources0
The Dual-use Dilemma in LLMs: Do Empowering Ethical Capacities Make a Degraded Utility?0
The Effectiveness of Intermediate-Task Training for Code-Switched Natural Language Understanding0
The Effect of Natural Distribution Shift on Question Answering Models0
The Effect of Negative Sampling Strategy on Capturing Semantic Similarity in Document Embeddings0
The Empirical Impact of Data Sanitization on Language Models0
The Essence of Contextual Understanding in Theory of Mind: A Study on Question Answering with Story Characters0
The Event StoryLine Corpus: A New Benchmark for Causal and Temporal Relation Extraction0
The Fire Thief Is Also the Keeper: Balancing Usability and Privacy in Prompts0
The First Multilingual Surface Realisation Shared Task (SR’18): Overview and Evaluation Results0
The FLaReNet Strategic Language Resource Agenda0
The Forgettable-Watcher Model for Video Question Answering0
The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics0
The Generative AI Paradox on Evaluation: What It Can Solve, It May Not Evaluate0
The geometry of BERT0
The Geometry of Queries: Query-Based Innovations in Retrieval-Augmented Generation0
The Global Banking Standards QA Dataset (GBS-QA)0
The Great Nugget Recall: Automating Fact Extraction and RAG Evaluation with Large Language Models0
The Hallucinations Leaderboard -- An Open Effort to Measure Hallucinations in Large Language Models0
The Hallucination Tax of Reinforcement Finetuning0
The Hidden Structure -- Improving Legal Document Understanding Through Explicit Text Formatting0
The Impact of Explanations on AI Competency Prediction in VQA0
The Impact of Large Language Models on Task Automation in Manufacturing Services0
The Inductive Bias of In-Context Learning: Rethinking Pretraining Example Design0
The JDDC Corpus: A Large-Scale Multi-Turn Chinese Dialogue Dataset for E-commerce Customer Service0
The KBGen Challenge0
The Language Application Grid0
THELMA: Task Based Holistic Evaluation of Large Language Model Applications-RAG Question Answering0
The Margarita Dialogue Corpus: A Data Set for Time-Offset Interactions and Unstructured Dialogue Systems0
The meaning of "most" for visual question answering models0
The Meaning of ``Most'' for Visual Question Answering Models0
The Meta-knowledge of Causality in Biomedical Scientific Discourse0
The Multilingual Paraphrase Database0
The Multi-Modal Video Reasoning and Analyzing Competition0
The Myopia of Crowds: A Study of Collective Evaluation on Stack Exchange0
The Open Framework for Developing Knowledge Base And Question Answering System0
The Physics of Text: Ontological Realism in Information Extraction0
The price of debiasing automatic metrics in natural language evalaution0
The Quest for Visual Understanding: A Journey Through the Evolution of Visual Question Answering0
The RatioLog Project: Rational Extensions of Logical Reasoning0
The representation and extraction of qunatitative information0
The Rich Event Ontology0
Show:102550
← PrevPage 210 of 217Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1IE-Net (ensemble)EM90.94Unverified
2FPNet (ensemble)EM90.87Unverified
3IE-NetV2 (ensemble)EM90.86Unverified
4SA-Net on Albert (ensemble)EM90.72Unverified
5SA-Net-V2 (ensemble)EM90.68Unverified
6FPNet (ensemble)EM90.6Unverified
7Retro-Reader (ensemble)EM90.58Unverified
8EntitySpanFocusV2 (ensemble)EM90.52Unverified
9TransNets + SFVerifier + SFEnsembler (ensemble)EM90.49Unverified
10EntitySpanFocus+AT (ensemble)EM90.45Unverified