SOTAVerified

Question Answering

Question answering can be segmented into domain-specific tasks like community question answering and knowledge-base question answering. Popular benchmark datasets for evaluation question answering systems include SQuAD, HotPotQA, bAbI, TriviaQA, WikiQA, and many others. Models for question answering are typically evaluated on metrics like EM and F1. Some recent top performing models are T5 and XLNet.

( Image credit: SQuAD )

Papers

Showing 19011950 of 10817 papers

TitleStatusHype
Check It Again: Progressive Visual Question Answering via Visual EntailmentCode1
Check It Again:Progressive Visual Question Answering via Visual EntailmentCode1
Complex Temporal Question Answering on Knowledge GraphsCode1
Compositional 4D Dynamic Scenes Understanding with Physics Priors for Video Question AnsweringCode1
Complex Reasoning over Logical Queries on Commonsense Knowledge GraphsCode1
Complex Knowledge Base Question Answering: A SurveyCode1
ChestX-Reasoner: Advancing Radiology Foundation Models with Reasoning through Step-by-Step VerificationCode1
Just Ask: Learning to Answer Questions from Millions of Narrated VideosCode1
Compositional Exemplars for In-context LearningCode1
Synthesizing Event-centric Knowledge Graphs of Daily Activities Using Virtual SpaceCode1
ChiMed-GPT: A Chinese Medical Large Language Model with Full Training Regime and Better Alignment to Human PreferencesCode1
K-Adapter: Infusing Knowledge into Pre-Trained Models with AdaptersCode1
ChineseEcomQA: A Scalable E-commerce Concept Evaluation Benchmark for Large Language ModelsCode1
KBQA-o1: Agentic Knowledge Base Question Answering with Monte Carlo Tree SearchCode1
Learning Trimodal Relation for AVQA with Missing ModalityCode1
Kalman Filter Enhanced GRPO for Reinforcement Learning-Based Language Model ReasoningCode1
Learning to Rank Question-Answer Pairs using Hierarchical Recurrent Encoder with Latent Topic ClusteringCode1
A Step Closer to Comprehensive Answers: Constrained Multi-Stage Question Decomposition with Large Language ModelsCode1
Learning to Retrieve Passages without SupervisionCode1
Fine-tuned LLMs Know More, Hallucinate Less with Few-Shot Sequence-to-Sequence Semantic Parsing over WikidataCode1
mmRAG: A Modular Benchmark for Retrieval-Augmented Generation over Text, Tables, and Knowledge GraphsCode1
Learning to Poison Large Language Models for Downstream ManipulationCode1
Learning to Retrieve Reasoning Paths over Wikipedia Graph for Question AnsweringCode1
Kformer: Knowledge Injection in Transformer Feed-Forward LayersCode1
Less is More: Data-Efficient Complex Question Answering over Knowledge BasesCode1
KGE-CL: Contrastive Learning of Tensor Decomposition Based Knowledge Graph EmbeddingsCode1
Learning to Explain: Datasets and Models for Identifying Valid Reasoning Chains in Multihop Question-AnsweringCode1
KG-Retriever: Efficient Knowledge Indexing for Retrieval-Augmented Large Language ModelsCode1
Aligning Instruction Tasks Unlocks Large Language Models as Zero-Shot Relation ExtractorsCode1
KILT: a Benchmark for Knowledge Intensive Language TasksCode1
CommonsenseQA: A Question Answering Challenge Targeting Commonsense KnowledgeCode1
TAP: Text-Aware Pre-training for Text-VQA and Text-CaptionCode1
A Dataset for Medical Instructional Video Classification and Question AnsweringCode1
KLEJ: Comprehensive Benchmark for Polish Language UnderstandingCode1
Knowledge-Augmented Language Model VerificationCode1
Knowledge Base Question Answering by Case-based Reasoning over SubgraphsCode1
Task-Oriented Multi-User Semantic Communications for VQA TaskCode1
Knowledge-Based Video Question Answering with Unsupervised Scene DescriptionsCode1
Citekit: A Modular Toolkit for Large Language Model Citation GenerationCode1
CityEQA: A Hierarchical LLM Agent on Embodied Question Answering Benchmark in City SpaceCode1
AssistSR: Task-oriented Video Segment Retrieval for Personal AI AssistantCode1
CKBP v2: Better Annotation and Reasoning for Commonsense Knowledge Base PopulationCode1
Learning to Discretely Compose Reasoning Module Networks for Video CaptioningCode1
Learning to Ask Like a PhysicianCode1
Knowledge-driven Data Construction for Zero-shot Evaluation in Commonsense Question AnsweringCode1
CLAPNQ: Cohesive Long-form Answers from Passages in Natural Questions for RAG systemsCode1
A Memory Efficient Baseline for Open Domain Question AnsweringCode1
ClarQ: A large-scale and diverse dataset for Clarification Question GenerationCode1
Tem-adapter: Adapting Image-Text Pretraining for Video Question AnswerCode1
Learning to Attribute with AttentionCode1
Show:102550
← PrevPage 39 of 217Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1IE-Net (ensemble)EM90.94Unverified
2FPNet (ensemble)EM90.87Unverified
3IE-NetV2 (ensemble)EM90.86Unverified
4SA-Net on Albert (ensemble)EM90.72Unverified
5SA-Net-V2 (ensemble)EM90.68Unverified
6FPNet (ensemble)EM90.6Unverified
7Retro-Reader (ensemble)EM90.58Unverified
8EntitySpanFocusV2 (ensemble)EM90.52Unverified
9TransNets + SFVerifier + SFEnsembler (ensemble)EM90.49Unverified
10EntitySpanFocus+AT (ensemble)EM90.45Unverified