SOTAVerified

Question Answering

Question answering can be segmented into domain-specific tasks like community question answering and knowledge-base question answering. Popular benchmark datasets for evaluation question answering systems include SQuAD, HotPotQA, bAbI, TriviaQA, WikiQA, and many others. Models for question answering are typically evaluated on metrics like EM and F1. Some recent top performing models are T5 and XLNet.

( Image credit: SQuAD )

Papers

Showing 70017050 of 10817 papers

TitleStatusHype
Object-Centric Temporal Consistency via Conditional Autoregressive Inductive Biases0
Occam's Gates0
Constraint-based Multi-hop Question Answering with Knowledge Graph0
Adding Context to Semantic Data-Driven Paraphrasing0
Evaluating the Retrieval Component in LLM-Based Question Answering Systems0
TMLab SRPOL at SemEval-2019 Task 8: Fact Checking in Community Question Answering Forums0
Harnessing Multilingual Resources to Question Answering in Arabic0
OG-RAG: Ontology-Grounded Retrieval-Augmented Generation For Large Language Models0
Evaluating the Symbol Binding Ability of Large Language Models for Multiple-Choice Questions in Vietnamese General Education0
oIQa: An Opinion Influence Oriented Question Answering Framework with Applications to Marketing Domain0
Constraint Based Description of Polish Multiword Expressions0
Harnessing Large Vision and Language Models in Agriculture: A Review0
Olelo: A Question Answering Application for Biomedicine0
OMCAT: Omni Context Aware Transformer0
Constant Time Graph Neural Networks0
A Survey for Efficient Open Domain Question Answering0
Optima: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System0
Optimization of Retrieval-Augmented Generation Context with Outlier Detection0
Evaluation and Enhancement of Semantic Grounding in Large Vision-Language Models0
Optimizing Inference Performance of Transformers on CPUs0
Order Matters: Exploring Order Sensitivity in Multimodal Large Language Models0
Evaluation for Partial Event Coreference0
Harnessing AI for efficient analysis of complex policy documents: a case study of Executive Order 141100
Omni-R1: Do You Really Need Audio to Fine-Tune Your Audio LLM?0
Hard to Cheat: A Turing Test based on Answering Questions about Images0
A Supervised Word Alignment Method based on Cross-Language Span Prediction using Multilingual BERT0
Hanfu-Bench: A Multimodal Benchmark on Cross-Temporal Cultural Understanding and Transcreation0
OMoS-QA: A Dataset for Cross-Lingual Extractive Question Answering in a German Migration Context0
Accelerating Manufacturing Scale-Up from Material Discovery Using Agentic Web Navigation and Retrieval-Augmented AI for Process Engineering Schematics Design0
On Advances in Text Generation from Images Beyond Captioning: A Case Study in Self-Rationalization0
Opinion Holder and Target Extraction on Opinion Compounds – A Linguistic Approach0
Evaluation of ChatGPT on Biomedical Tasks: A Zero-Shot Comparison with Fine-Tuned Generative Transformers0
On-Demand Distributional Semantic Distance and Paraphrasing0
On-demand Injection of Lexical Knowledge for Recognising Textual Entailment0
Handling Multiword Expressions in Causality Estimation0
OneEncoder: A Lightweight Framework for Progressive Alignment of Modalities0
Handling Anomalies of Synthetic Questions in Unsupervised Question Answering0
Hand in Glove: Deep Feature Fusion Network Architectures for Answer Quality Prediction in Community Question Answering0
HAMMR: HierArchical MultiModal React agents for generic VQA0
A Supervised Approach for Enriching the Relational Structure of Frame Semantics in FrameNet0
A Multi-answer Multi-task Framework for Real-world Machine Reading Comprehension0
Evaluation of medium-large Language Models at zero-shot closed book generative question answering0
Opinion Mining with Deep Recurrent Neural Networks0
OneStop QAMaker: Extract Question-Answer Pairs from Text in a One-Stop Approach0
Consistency and Uncertainty: Identifying Unreliable Responses From Black-Box Vision-Language Models for Selective Visual Question Answering0
On Evaluating Embedding Models for Knowledge Base Completion0
On Evaluating the Integration of Reasoning and Action in LLM Agents with Database Question Answering0
One Vector is Not Enough: Entity-Augmented Distributed Semantics for Discourse Relations0
Open-Vocabulary Functional 3D Scene Graphs for Real-World Indoor Spaces0
ConSens: Assessing context grounding in open-book question answering0
Show:102550
← PrevPage 141 of 217Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1IE-Net (ensemble)EM90.94Unverified
2FPNet (ensemble)EM90.87Unverified
3IE-NetV2 (ensemble)EM90.86Unverified
4SA-Net on Albert (ensemble)EM90.72Unverified
5SA-Net-V2 (ensemble)EM90.68Unverified
6FPNet (ensemble)EM90.6Unverified
7Retro-Reader (ensemble)EM90.58Unverified
8EntitySpanFocusV2 (ensemble)EM90.52Unverified
9TransNets + SFVerifier + SFEnsembler (ensemble)EM90.49Unverified
10EntitySpanFocus+AT (ensemble)EM90.45Unverified