SOTAVerified

Question Answering

Question answering can be segmented into domain-specific tasks like community question answering and knowledge-base question answering. Popular benchmark datasets for evaluation question answering systems include SQuAD, HotPotQA, bAbI, TriviaQA, WikiQA, and many others. Models for question answering are typically evaluated on metrics like EM and F1. Some recent top performing models are T5 and XLNet.

( Image credit: SQuAD )

Papers

Showing 92519300 of 10817 papers

TitleStatusHype
More Accurate Question Answering on FreebaseCode0
CRIPP-VQA: Counterfactual Reasoning about Implicit Physical Properties via Video Question AnsweringCode0
A Neuro-Symbolic ASP Pipeline for Visual Question AnsweringCode0
Right this way: Can VLMs Guide Us to See More to Answer Questions?Code0
BioD2C: A Dual-level Semantic Consistency Constraint Framework for Biomedical VQACode0
Rethinking Label Smoothing on Multi-hop Question AnsweringCode0
A Neural-Symbolic Approach to Natural Language UnderstandingCode0
Joint Learning of Answer Selection and Answer Summary Generation in Community Question AnsweringCode0
Crake: Causal-Enhanced Table-Filler for Question Answering over Large Scale Knowledge BaseCode0
A Case Study of Cross-Lingual Zero-Shot Generalization for Classical Languages in LLMsCode0
TOP-Training: Target-Oriented Pretraining for Medical Extractive Question AnsweringCode0
Heterogeneous Memory Enhanced Multimodal Attention Model for Video Question AnsweringCode0
Parameter-Efficient Abstractive Question Answering over Tables or TextCode0
Parameter Efficient Fine Tuning Llama 3.1 for Answering Arabic Legal Questions: A Case Study on Jordanian LawsCode0
A Role-specific Guided Large Language Model for Ophthalmic Consultation Based on Stylistic DifferentiationCode0
Crafting In-context Examples according to LMs' Parametric KnowledgeCode0
BinaryVQA: A Versatile Test Set to Evaluate the Out-of-Distribution Generalization of VQA ModelsCode0
Quantifying and Alleviating the Language Prior Problem in Visual Question AnsweringCode0
CQASUMM: Building References for Community Question Answering Summarization CorporaCode0
COV19IR : COVID-19 Domain Literature Information RetrievalCode0
Bidirectional Attention Flow for Machine ComprehensionCode0
MovieQA: Understanding Stories in Movies through Question-AnsweringCode0
ArNLI: Arabic Natural Language Inference for Entailment and Contradiction DetectionCode0
A Bias-Variance-Covariance Decomposition of Kernel Scores for Generative ModelsCode0
Moving Beyond the Turing Test with the Allen AI Science ChallengeCode0
Joint Visual and Text Prompting for Improved Object-Centric Perception with Multimodal Large Language ModelsCode0
HeroNet: A Hybrid Retrieval-Generation Network for Conversational BotsCode0
Value-Spectrum: Quantifying Preferences of Vision-Language Models via Value Decomposition in Social Media ContextsCode0
JPAVE: A Generation and Classification-based Model for Joint Product Attribute Prediction and Value ExtractionCode0
Coupling Context Modeling with Zero Pronoun Recovering for Document-Level Natural Language GenerationCode0
A Neural Named Entity Recognition and Multi-Type Normalization Tool for Biomedical Text MiningCode0
ANetQA: A Large-scale Benchmark for Fine-grained Compositional Reasoning over Untrimmed VideosCode0
Judging the Judges: Can Large Vision-Language Models Fairly Evaluate Chart Comprehension and Reasoning?Code0
MphayaNER: Named Entity Recognition for TshivendaCode0
A Better Way to Attend: Attention with Trees for Video Question AnsweringCode0
Counting Everyday Objects in Everyday ScenesCode0
Help Me Identify: Is an LLM+VQA System All We Need to Identify Visual Concepts?Code0
Just ASR + LLM? A Study on Speech Large Language Models' Ability to Identify and Understand Speaker in Spoken DialogueCode0
'Just because you are right, doesn't mean I am wrong': Overcoming a Bottleneck in the Development and Evaluation of Open-Ended Visual Question Answering (VQA) TasksCode0
Counterfactual Learning from Human Proofreading Feedback for Semantic ParsingCode0
Just ClozE! A Novel Framework for Evaluating the Factual Consistency Faster in Abstractive SummarizationCode0
Helmsman of the Masses? Evaluate the Opinion Leadership of Large Language Models in the Werewolf GameCode0
KaFSP: Knowledge-Aware Fuzzy Semantic Parsing for Conversational Question Answering over a Large-Scale Knowledge BaseCode0
RISC: Generating Realistic Synthetic Bilingual Insurance ContractCode0
KaggleDBQA: Realistic Evaluation of Text-to-SQL ParsersCode0
MQA: Answering the Question via Robotic ManipulationCode0
RecallM: An Adaptable Memory Mechanism with Temporal Understanding for Large Language ModelsCode0
HCqa: Hybrid and Complex Question Answering on Textual Corpus and Knowledge GraphCode0
HC3 Plus: A Semantic-Invariant Human ChatGPT Comparison CorpusCode0
Counterfactual Adversarial Learning with Representation InterpolationCode0
Show:102550
← PrevPage 186 of 217Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1IE-Net (ensemble)EM90.94Unverified
2FPNet (ensemble)EM90.87Unverified
3IE-NetV2 (ensemble)EM90.86Unverified
4SA-Net on Albert (ensemble)EM90.72Unverified
5SA-Net-V2 (ensemble)EM90.68Unverified
6FPNet (ensemble)EM90.6Unverified
7Retro-Reader (ensemble)EM90.58Unverified
8EntitySpanFocusV2 (ensemble)EM90.52Unverified
9TransNets + SFVerifier + SFEnsembler (ensemble)EM90.49Unverified
10EntitySpanFocus+AT (ensemble)EM90.45Unverified