SOTAVerified

Question Answering

Question answering can be segmented into domain-specific tasks like community question answering and knowledge-base question answering. Popular benchmark datasets for evaluation question answering systems include SQuAD, HotPotQA, bAbI, TriviaQA, WikiQA, and many others. Models for question answering are typically evaluated on metrics like EM and F1. Some recent top performing models are T5 and XLNet.

( Image credit: SQuAD )

Papers

Showing 1065110700 of 10817 papers

TitleStatusHype
Think Visually: Question Answering through Virtual ImageryCode0
Symbolic Priors for RNN-based Semantic ParsingCode0
SyllabusQA: A Course Logistics Question Answering DatasetCode0
Unifying Text, Tables, and Images for Multimodal Question AnsweringCode0
Think Twice: Measuring the Efficiency of Eliminating Prediction Shortcuts of Question Answering ModelsCode0
SwissAlps at SemEval-2017 Task 3: Attention-based Convolutional Neural Network for Community Question AnsweringCode0
Think before You Simulate: Symbolic Reasoning to Orchestrate Neural Computation for Counterfactual Question AnsweringCode0
X-GGM: Graph Generative Modeling for Out-of-Distribution Generalization in Visual Question AnsweringCode0
Thieves on Sesame Street! Model Extraction of BERT-based APIsCode0
SWI: Speaking with Intent in Large Language ModelsCode0
UNIMELB at SemEval-2016 Tasks 4A and 4B: An Ensemble of Neural Networks and a Word2Vec Based Model for Sentiment ClassificationCode0
Simple Applications of BERT for Ad Hoc Document RetrievalCode0
SURE-VQA: Systematic Understanding of Robustness Evaluation in Medical VQA TasksCode0
Simple and Effective Text Matching with Richer Alignment FeaturesCode0
They Exist! Introducing Plural Mentions to Coreference Resolution and Entity LinkingCode0
Supervised Knowledge Makes Large Language Models Better In-context LearnersCode0
Self Question-answering: Aspect-based Sentiment Analysis by Role Flipped Machine Reading ComprehensionCode0
UniPSDA: Unsupervised Pseudo Semantic Data Augmentation for Zero-Shot Cross-Lingual Natural Language UnderstandingCode0
The TechQA DatasetCode0
The Shmoop Corpus: A Dataset of Stories with Loosely Aligned SummariesCode0
The Role of Output Vocabulary in T2T LMs for SPARQL Semantic ParsingCode0
The representation landscape of few-shot learning and fine-tuning in large language modelsCode0
UniRS: Unifying Multi-temporal Remote Sensing Tasks through Vision Language ModelsCode0
SemEval-2019 Task 10: Math Question AnsweringCode0
WebQAmGaze: A Multilingual Webcam Eye-Tracking-While-Reading DatasetCode0
SUNNYNLP at SemEval-2018 Task 10: A Support-Vector-Machine-Based Method for Detecting Semantic Difference using Taxonomy and Word Embedding FeaturesCode0
There is No Big Brother or Small Brother: Knowledge Infusion in Language Models for Link Prediction and Question AnsweringCode0
Similar Cases Recommendation using Legal Knowledge GraphsCode0
WikiCausal: Corpus and Evaluation Framework for Causal Knowledge Graph ConstructionCode0
You don't need a personality test to know these models are unreliable: Assessing the Reliability of Large Language Models on Psychometric InstrumentsCode0
Sim2Real Transfer for Vision-Based Grasp VerificationCode0
The Promise of Premise: Harnessing Question Premises in Visual Question AnsweringCode0
The price of debiasing automatic metrics in natural language evaluationCode0
The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural SupervisionCode0
WeCheck: Strong Factual Consistency Checker via Weakly Supervised LearningCode0
Subjective Question Answering: Deciphering the inner workings of Transformers in the realm of subjectivityCode0
Universal Semantic ParsingCode0
Weisfeiler and Leman Go RelationalCode0
Structured Triplet Learning with POS-tag Guided Attention for Visual Question AnsweringCode0
Zero-shot Translation of Attention Patterns in VQA Models to Natural LanguageCode0
Visual Choice of Plausible Alternatives: An Evaluation of Image-based Commonsense Causal ReasoningCode0
Unlearning Sensitive Information in Multimodal LLMs: Benchmark and Attack-Defense EvaluationCode0
Structural Self-Supervised Objectives for TransformersCode0
Unleashing the Potentials of Likelihood Composition for Multi-modal Language ModelsCode0
Evaluating Search Engines and Large Language Models for Answering Health QuestionsCode0
Unlocking Anticipatory Text Generation: A Constrained Approach for Large Language Models DecodingCode0
The NarrativeQA Reading Comprehension ChallengeCode0
Unlocking Markets: A Multilingual Benchmark to Cross-Market Question AnsweringCode0
Visual Contexts Clarify Ambiguous Expressions: A Benchmark DatasetCode0
Stochastic Answer Networks for SQuAD 2.0Code0
Show:102550
← PrevPage 214 of 217Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1IE-Net (ensemble)EM90.94Unverified
2FPNet (ensemble)EM90.87Unverified
3IE-NetV2 (ensemble)EM90.86Unverified
4SA-Net on Albert (ensemble)EM90.72Unverified
5SA-Net-V2 (ensemble)EM90.68Unverified
6FPNet (ensemble)EM90.6Unverified
7Retro-Reader (ensemble)EM90.58Unverified
8EntitySpanFocusV2 (ensemble)EM90.52Unverified
9TransNets + SFVerifier + SFEnsembler (ensemble)EM90.49Unverified
10EntitySpanFocus+AT (ensemble)EM90.45Unverified