SOTAVerified

Question Answering

Question answering can be segmented into domain-specific tasks like community question answering and knowledge-base question answering. Popular benchmark datasets for evaluation question answering systems include SQuAD, HotPotQA, bAbI, TriviaQA, WikiQA, and many others. Models for question answering are typically evaluated on metrics like EM and F1. Some recent top performing models are T5 and XLNet.

( Image credit: SQuAD )

Papers

Showing 1070110750 of 10817 papers

TitleStatusHype
Unlocking Temporal Question Answering for Large Language Models with Tailor-Made Reasoning LogicCode0
NatLan: Native Language Prompting Facilitates Knowledge Elicitation Through Language Trigger Provision and Domain Trigger RetentionCode0
Visual Dialogue without Vision or DialogueCode0
SilVar: Speech Driven Multimodal Model for Reasoning Visual Question Answering and Object LocalizationCode0
Self-Critique Guided Iterative Reasoning for Multi-hop Question AnsweringCode0
Zero-Shot Rationalization by Multi-Task Transfer Learning from Question AnsweringCode0
Unmasking the Limits of Large Language Models: A Systematic Evaluation of Masked Text Processing Ability through MskQA and MskCalCode0
The Limited Impact of Medical Adaptation of Large Language and Vision-Language ModelsCode0
Unraveling and Mitigating Retriever Inconsistencies in Retrieval-Augmented Large Language ModelsCode0
XLM-V: Overcoming the Vocabulary Bottleneck in Multilingual Masked Language ModelsCode0
Stochastic Answer Networks for Machine Reading ComprehensionCode0
StepSearch: Igniting LLMs Search Ability via Step-Wise Proximal Policy OptimizationCode0
What Can Neural Networks Reason About?Code0
What Can Secondary Predictions Tell Us? An Exploration on Question-Answering with SQuAD-v2.0Code0
What Can We Learn From Almost a Decade of Food TweetsCode0
Step by step: a hierarchical framework for multi-hop knowledge graph reasoning with reinforcement learningCode0
The Interpreter Understands Your Meaning: End-to-end Spoken Language Understanding Aided by Speech TranslationCode0
STAIR: Spatial-Temporal Reasoning with Auditable Intermediate Results for Video Question AnsweringCode0
WikiReading: A Novel Large-scale Language Understanding Task over WikipediaCode0
LLaVA Steering: Visual Instruction Tuning with 500x Fewer Parameters through Modality Linear Representation-SteeringCode0
The Illusion of Competence: Evaluating the Effect of Explanations on Users' Mental Models of Visual Question Answering SystemsCode0
The Effect of Masking Strategies on Knowledge Retention by Language ModelsCode0
Unsupervised Multiple Choices Question Answering: Start Learning from Basic KnowledgeCode0
AugTriever: Unsupervised Dense Retrieval and Domain Adaptation by Scalable Data AugmentationCode0
Unsupervised Dense Retrieval Training with Web AnchorsCode0
SRQA: Synthetic Reader for Factoid Question AnsweringCode0
Whatcha lookin' at? DeepLIFTing BERT's Attention in Question AnsweringCode0
Visually Dehallucinative Instruction GenerationCode0
Visually Grounded VQA by Lattice-based RetrievalCode0
SqueezeBERT: What can computer vision teach NLP about efficient neural networks?Code0
Unsupervised Improvement of Factual Knowledge in Language ModelsCode0
Sigma: A dataset for text-to-code semantic parsing with statistical analysisCode0
Visually Interpretable Subtask Reasoning for Visual Question AnsweringCode0
The Devil is in the Details: Evaluating Limitations of Transformer-based Methods for Granular TasksCode0
What Does My QA Model Know? Devising Controlled Probes using Expert KnowledgeCode0
Where is the answer? Investigating Positional Bias in Language Model Knowledge ExtractionCode0
Unsupervised Matching of Data and TextCode0
The BLue Amazon Brain (BLAB): A Modular Architecture of Services about the Brazilian Maritime TerritoryCode0
Self-Critical Reasoning for Robust Visual Question AnsweringCode0
SQL Generation via Machine Reading ComprehensionCode0
SQATIN: Supervised Instruction Tuning Meets Question Answering for Improved Dialogue NLUCode0
Unsupervised Natural Language Generation with Denoising AutoencodersCode0
Siamese Tracking with Lingual Object ConstraintsCode0
Speed Reading: Learning to Read ForBackward via ShuttleCode0
Should We Fine-Tune or RAG? Evaluating Different Techniques to Adapt LLMs for DialogueCode0
Zero-shot User Intent Detection via Capsule Neural NetworksCode0
Unsupervised Question Answering by Cloze TranslationCode0
YTCommentQA: Video Question Answerability in Instructional VideosCode0
Unsupervised Question Answering via Answer DiversifyingCode0
Will LLMs Replace the Encoder-Only Models in Temporal Relation Classification?Code0
Show:102550
← PrevPage 215 of 217Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1IE-Net (ensemble)EM90.94Unverified
2FPNet (ensemble)EM90.87Unverified
3IE-NetV2 (ensemble)EM90.86Unverified
4SA-Net on Albert (ensemble)EM90.72Unverified
5SA-Net-V2 (ensemble)EM90.68Unverified
6FPNet (ensemble)EM90.6Unverified
7Retro-Reader (ensemble)EM90.58Unverified
8EntitySpanFocusV2 (ensemble)EM90.52Unverified
9TransNets + SFVerifier + SFEnsembler (ensemble)EM90.49Unverified
10EntitySpanFocus+AT (ensemble)EM90.45Unverified