SOTAVerified

Question Answering

Question answering can be segmented into domain-specific tasks like community question answering and knowledge-base question answering. Popular benchmark datasets for evaluation question answering systems include SQuAD, HotPotQA, bAbI, TriviaQA, WikiQA, and many others. Models for question answering are typically evaluated on metrics like EM and F1. Some recent top performing models are T5 and XLNet.

( Image credit: SQuAD )

Papers

Showing 26512700 of 10817 papers

TitleStatusHype
CAPO: Reinforcing Consistent Reasoning in Medical Decision-Making0
A Question Answering Approach to Emotion Cause Extraction0
Efficient In-Domain Question Answering for Resource-Constrained Environments0
Capabilities of Gemini Models in Medicine0
Can You Unpack That? Learning to Rewrite Questions-in-Context0
A Question Answering Approach for Emotion Cause Extraction0
Can you even tell left from right? Presenting a new challenge for VQA0
Can MLLMs Generalize to Multi-Party dialog? Exploring Multilingual Response Generation in Complex Scenarios0
Active Reasoning in an Open-World Environment0
Efficient Knowledge Feeding to Language Models: A Novel Integrated Encoder-Decoder Architecture0
EfficientLLaVA: Generalizable Auto-Pruning for Large Vision-language Models0
Can We Infer Confidential Properties of Training Data from LLMs?0
A Quantitative Evaluation of Natural Language Question Interpretation for Question Answering Systems0
Active Reading Comprehension: A Dataset for Learning the Question-Answer Relationship Strategy0
Can We Generate Visual Programs Without Prompting LLMs?0
Can We Create a Tool for General Domain Event Analysis?0
Document-level Event Extraction with Efficient End-to-end Learning of Cross-event Dependencies0
Can Visual Language Models Replace OCR-Based Visual Question Answering Pipelines in Production? A Case Study in Retail0
AI-KU: Using Co-Occurrence Modeling for Semantic Similarity0
Can Vision-Language Models Answer Face to Face Questions in the Real-World?0
Abductive Matching in Question Answering0
EfficientEQA: An Efficient Approach for Open Vocabulary Embodied Question Answering0
Can Transformers Reason About Effects of Actions?0
Can Small Language Models Help Large Language Models Reason Better?: LM-Guided Chain-of-Thought0
Do Question Answering Modeling Improvements Hold Across Benchmarks?0
Can SAR improve RSVQA performance?0
Efficient crowdsourcing of crowd-generated microtasks0
A Procedural Definition of Multi-word Lexical Units0
Can Question Generation Debias Question Answering Models? A Case Study on Question–Context Lexical Overlap0
AiFu at SemEval-2019 Task 10: A Symbolic and Sub-symbolic Integrated System for SAT Math Question Answering0
Efficient Deployment of Conversational Natural Language Interfaces over Databases0
Efficient Few-Shot Continual Learning in Vision-Language Models0
A Probabilistic Model for Joint Learning of Word Embeddings from Texts and Images0
Can Pre-training help VQA with Lexical Variations?0
AIDA: Artificial Intelligent Dialogue Agent0
Can predicate-argument relationships be extracted from UD trees?0
A Probabilistic-Logic based Commonsense Representation Framework for Modelling Inferences with Multiple Antecedents and Varying Likelihoods0
Evaluating the Ebb and Flow: An In-depth Analysis of Question-Answering Trends across Diverse Platforms0
Can Open Domain Question Answering Systems Answer Visual Knowledge Questions?0
A Probabilistic Lexical Model for Ranking Textual Inferences0
A Probabilistic Annotation Model for Crowdsourcing Coreference0
AIA-BDE: A Corpus of FAQs in Portuguese and their Variations0
Actively Seeking and Learning from Live Data0
Efficient Bilinear Attention-based Fusion for Medical Visual Question Answering0
Can Multimodal LLMs do Visual Temporal Understanding and Reasoning? The answer is No!0
A Pretraining Numerical Reasoning Model for Ordinal Constrained Question Answering on Knowledge Base0
Can LLMs Generate Human-Like Wayfinding Instructions? Towards Platform-Agnostic Embodied Instruction Synthesis0
Can LLMs assist with Ambiguity? A Quantitative Evaluation of various Large Language Models on Word Sense Disambiguation0
A Preliminary Study of o1 in Medicine: Are We Closer to an AI Doctor?0
Can Large Language Models Unveil the Mysteries? An Exploration of Their Ability to Unlock Information in Complex Scenarios0
Show:102550
← PrevPage 54 of 217Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1IE-Net (ensemble)EM90.94Unverified
2FPNet (ensemble)EM90.87Unverified
3IE-NetV2 (ensemble)EM90.86Unverified
4SA-Net on Albert (ensemble)EM90.72Unverified
5SA-Net-V2 (ensemble)EM90.68Unverified
6FPNet (ensemble)EM90.6Unverified
7Retro-Reader (ensemble)EM90.58Unverified
8EntitySpanFocusV2 (ensemble)EM90.52Unverified
9TransNets + SFVerifier + SFEnsembler (ensemble)EM90.49Unverified
10EntitySpanFocus+AT (ensemble)EM90.45Unverified