SOTAVerified

Question Answering

Question answering can be segmented into domain-specific tasks like community question answering and knowledge-base question answering. Popular benchmark datasets for evaluation question answering systems include SQuAD, HotPotQA, bAbI, TriviaQA, WikiQA, and many others. Models for question answering are typically evaluated on metrics like EM and F1. Some recent top performing models are T5 and XLNet.

( Image credit: SQuAD )

Papers

Showing 99019950 of 10817 papers

TitleStatusHype
Temporally-Grounded Language Generation: A Benchmark for Real-Time Vision-Language ModelsCode0
Neural Semantic Parsing with Type Constraints for Semi-Structured TablesCode0
Neural Shuffle-Exchange Networks -- Sequence Processing in O(n log n) TimeCode0
Lexicalization Is All You Need: Examining the Impact of Lexical Knowledge in a Compositional QALD SystemCode0
Neural Shuffle-Exchange Networks - Sequence Processing in O(n log n) TimeCode0
Neural Stored-program MemoryCode0
Pre-training Cross-lingual Open Domain Question Answering with Large-scale Synthetic SupervisionCode0
Addressing Issues of Cross-Linguality in Open-Retrieval Question Answering Systems For Emergent DomainsCode0
EXAQ: Exponent Aware Quantization For LLMs AccelerationCode0
Scaling Reasoning can Improve Factuality in Large Language ModelsCode0
LGAR: Zero-Shot LLM-Guided Neural Ranking for Abstract Screening in Systematic Literature ReviewsCode0
Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language UnderstandingCode0
ExAnte: A Benchmark for Ex-Ante Inference in Large Language ModelsCode0
QUITE: Quantifying Uncertainty in Natural Language Text in Bayesian Reasoning ScenariosCode0
Co-attending Regions and Detections with Multi-modal Multiplicative Embedding for VQACode0
Examining Gender and Racial Bias in Large Vision-Language Models Using a Novel Dataset of Parallel ImagesCode0
Evidence Sentence Extraction for Machine Reading ComprehensionCode0
Neural Variational Inference for Text ProcessingCode0
DiffQue: Estimating Relative Difficulty of Questions in Community Question Answering ServicesCode0
Quizbowl: The Case for Incremental Question AnsweringCode0
Neurocache: Efficient Vector Retrieval for Long-range Language ModelingCode0
Pre-Training Multi-Modal Dense Retrievers for Outside-Knowledge Visual Question AnsweringCode0
QPaug: Question and Passage Augmentation for Open-Domain Question Answering of LLMsCode0
Audiopedia: Audio QA with KnowledgeCode0
Lightweight Recurrent Cross-modal Encoder for Video Question AnsweringCode0
Likelihood as a Performance Gauge for Retrieval-Augmented GenerationCode0
Evidence-Focused Fact Summarization for Knowledge-Augmented Zero-Shot Question AnsweringCode0
Evidence Aggregation for Answer Re-Ranking in Open-Domain Question AnsweringCode0
Limited Ability of LLMs to Simulate Human Psychological Behaviours: a Psychometric AnalysisCode0
Active Learning to Guide Labeling Efforts for Question Difficulty EstimationCode0
Neuro-Symbolic Visual DialogCode0
Event Knowledge Incorporation with Posterior Regularization for Event-Centric Question AnsweringCode0
Relation-Aware Graph Attention Network for Visual Question AnsweringCode0
Event Detection as Question Answering with Entity InformationCode0
Pretraining Vision-Language Model for Difference Visual Question Answering in Longitudinal Chest X-raysCode0
Relation-aware Hierarchical Attention Framework for Video Question AnsweringCode0
A Qualitative Comparison of CoQA, SQuAD 2.0 and QuACCode0
CNN for Text-Based Multiple Choice Question AnsweringCode0
CNM: An Interpretable Complex-valued Network for MatchingCode0
Primacy Effect of ChatGPTCode0
Attribute Diversity Determines the Systematicity Gap in VQACode0
CODAH: An Adversarially Authored Question-Answer Dataset for Common SenseCode0
Event-Centric Question Answering via Contrastive Learning and Invertible Event TransformationCode0
Attributed and Predictive Entity Embedding for Fine-Grained Entity Typing in Knowledge BasesCode0
AttenWalker: Unsupervised Long-Document Question Answering via Attention-based Graph WalkingCode0
Evaluation of Semantic Answer Similarity MetricsCode0
LININ: Logic Integrated Neural Inference Network for Explanatory Visual Question AnsweringCode0
AQA: Adaptive Question Answering in a Society of LLMs via Contextual Multi-Armed BanditCode0
Evaluating the Elementary Multilingual Capabilities of Large Language Models with MultiQCode0
NewsQuote: A Dataset Built on Quote Extraction and Attribution for Expert Recommendation in Fact-CheckingCode0
Show:102550
← PrevPage 199 of 217Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1IE-Net (ensemble)EM90.94Unverified
2FPNet (ensemble)EM90.87Unverified
3IE-NetV2 (ensemble)EM90.86Unverified
4SA-Net on Albert (ensemble)EM90.72Unverified
5SA-Net-V2 (ensemble)EM90.68Unverified
6FPNet (ensemble)EM90.6Unverified
7Retro-Reader (ensemble)EM90.58Unverified
8EntitySpanFocusV2 (ensemble)EM90.52Unverified
9TransNets + SFVerifier + SFEnsembler (ensemble)EM90.49Unverified
10EntitySpanFocus+AT (ensemble)EM90.45Unverified