SOTAVerified

Question Answering

Question answering can be segmented into domain-specific tasks like community question answering and knowledge-base question answering. Popular benchmark datasets for evaluation question answering systems include SQuAD, HotPotQA, bAbI, TriviaQA, WikiQA, and many others. Models for question answering are typically evaluated on metrics like EM and F1. Some recent top performing models are T5 and XLNet.

( Image credit: SQuAD )

Papers

Showing 76517700 of 10817 papers

TitleStatusHype
Provoking Multi-modal Few-Shot LVLM via Exploration-Exploitation In-Context Learning0
I've got the "Answer"! Interpretation of LLMs Hidden States in Question Answering0
DaNetQA: a yes/no Question Answering Dataset for the Russian Language0
Prune Once for All: Sparse Pre-Trained Language Models0
It was the training data pruning too!0
It Takes Two to Tango: Towards Theory of AI's Mind0
It Takes Two to Tango: Navigating Conceptualizations of NLP Tasks and Measurements of Performance0
PSLM: Parallel Generation of Text and Speech with LLMs for Low-Latency Spoken Dialogue Systems0
Damage Assessment after Natural Disasters with UAVs: Semantic Feature Extraction using Deep Learning0
Automatically Extracting Procedural Knowledge from Instructional Texts using Natural Language Processing0
Psy-LLM: Scaling up Global Mental Health Psychological Services with AI-based Large Language Models0
An Efficient Active Learning Framework for New Relation Types0
It Takes Three to Tango: Triangulation Approach to Answer Ranking in Community Question Answering0
It's High Time: A Survey of Temporal Information Retrieval and Question Answering0
It's About Time: Incorporating Temporality in Retrieval Augmented Language Models0
Punctuation Prediction with Transition-based Parsing0
ITNLP-AiKF at SemEval-2016 Task 3 a quesiton answering system using community QA repository0
Pushing the boundary on Natural Language Inference0
Pushing the Limits of AMR Parsing with Self-Learning0
Pushing the Limits of ChatGPT on NLP Tasks0
DAHRS: Divergence-Aware Hallucination-Remediated SRL Projection0
Pushing the Limits of Radiology with Joint Modeling of Visual and Textual Information0
Automatically Developing a Fine-grained Arabic Named Entity Corpus and Gazetteer by utilizing Wikipedia0
It is AI’s Turn to Ask Humans a Question: Question-Answer Pair Generation for Children’s Story Books0
It is AI’s Turn to Ask Human a Question: Question and Answer Pair Generation for Children Storybooks in FairytaleQA Dataset0
PuzzleBench: Can LLMs Solve Challenging First-Order Combinatorial Reasoning Problems?0
DAFE: LLM-Based Evaluation Through Dynamic Arbitration for Free-Form Question-Answering0
PVChat: Personalized Video Chat with One-Shot Learning0
ITFormer: Bridging Time Series and Natural Language for Multi-Modal QA with Large-Scale Multitask Dataset0
Pyramid-Driven Alignment: Pyramid Principle Guided Integration of Large Language Models and Knowledge Graphs0
DADgraph: A Discourse-aware Dialogue Graph Neural Network for Multiparty Dialogue Machine Reading Comprehension0
Automated Utterance Generation0
An Effective Multi-Stage Approach For Question Answering0
Advancements and Challenges in Bangla Question Answering Models: A Comprehensive Review0
Iterative Utility Judgment Framework via LLMs Inspired by Relevance in Philosophy0
Q^2Forge: Minting Competency Questions and SPARQL Queries for Question-Answering Over Knowledge Graphs0
Iterative Scene Graph Generation with Generative Transformers0
Iterative Multi-document Neural Attention for Multiple Answer Prediction0
Iterative Adversarial Attack on Image-guided Story Ending Generation0
Iterated learning for emergent systematicity in VQA0
Cycle-Consistency for Robust Visual Question Answering0
Automated Testing and Improvement of Named Entity Recognition Systems0
An Effective Contextual Language Modeling Framework for Speech Summarization with Augmented Features0
Is This Just Fantasy? Language Model Representations Reflect Human Judgments of Event Plausibility0
"Is This It?": Towards Ecologically Valid Benchmarks for Situated Collaboration0
CyberBOT: Towards Reliable Cybersecurity Education via Ontology-Grounded Retrieval Augmented Generation0
Is Table Retrieval a Solved Problem? Exploring Join-Aware Multi-Table Retrieval0
QADiver: Interactive Framework for Diagnosing QA Models0
QA Domain Adaptation using Data Augmentation and Contrastive Adaptation0
Is Summary Useful or Not? An Extrinsic Human Evaluation of Text Summaries on Downstream Tasks0
Show:102550
← PrevPage 154 of 217Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1IE-Net (ensemble)EM90.94Unverified
2FPNet (ensemble)EM90.87Unverified
3IE-NetV2 (ensemble)EM90.86Unverified
4SA-Net on Albert (ensemble)EM90.72Unverified
5SA-Net-V2 (ensemble)EM90.68Unverified
6FPNet (ensemble)EM90.6Unverified
7Retro-Reader (ensemble)EM90.58Unverified
8EntitySpanFocusV2 (ensemble)EM90.52Unverified
9TransNets + SFVerifier + SFEnsembler (ensemble)EM90.49Unverified
10EntitySpanFocus+AT (ensemble)EM90.45Unverified