SOTAVerified

Question Answering

Question answering can be segmented into domain-specific tasks like community question answering and knowledge-base question answering. Popular benchmark datasets for evaluation question answering systems include SQuAD, HotPotQA, bAbI, TriviaQA, WikiQA, and many others. Models for question answering are typically evaluated on metrics like EM and F1. Some recent top performing models are T5 and XLNet.

( Image credit: SQuAD )

Papers

Showing 44014450 of 10817 papers

TitleStatusHype
Is Summary Useful or Not? An Extrinsic Human Evaluation of Text Summaries on Downstream Tasks0
Spoken Question Answering and Speech Continuation Using Spectrogram-Powered LLMCode0
Image Manipulation via Multi-Hop Instructions -- A New Dataset and Weakly-Supervised Neuro-Symbolic Approach0
Pre-training Language Models for Comparative Reasoning0
Few-shot Unified Question Answering: Tuning Models or Prompts?0
Exploring Contrast Consistency of Open-Domain Question Answering Systems on Minimally Edited QuestionsCode0
Sources of Hallucination by Large Language Models on Inference TasksCode1
RetICL: Sequential Retrieval of In-Context Examples with Reinforcement LearningCode1
BAND: Biomedical Alert News DatasetCode0
Knowledge Graphs Querying0
Asking Clarification Questions to Handle Ambiguity in Open-Domain QACode1
Evaluating and Modeling Attribution for Cross-Lingual Question Answering0
Question Answering as Programming for Solving Time-Sensitive QuestionsCode1
Few-Shot Data Synthesis for Open Domain Multi-Hop Question Answering0
RET-LLM: Towards a General Read-Write Memory for Large Language ModelsCode6
On the Risk of Misinformation Pollution with Large Language ModelsCode1
MemeCap: A Dataset for Captioning and Interpreting MemesCode1
Towards Graph-hop Retrieval and Reasoning in Complex Question Answering over Textual Database0
i-Code Studio: A Configurable and Composable Framework for Integrative AI0
Continual Dialogue State Tracking via Example-Guided Question AnsweringCode0
IfQA: A Dataset for Open-domain Question Answering under Counterfactual Presuppositions0
Knowledge of Knowledge: Exploring Known-Unknowns Uncertainty with Large Language ModelsCode0
Fine-tuned LLMs Know More, Hallucinate Less with Few-Shot Sequence-to-Sequence Semantic Parsing over WikidataCode1
Make a Choice! Knowledge Base Question Answering with In-Context Learning0
DUBLIN -- Document Understanding By Language-Image Network0
The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-TuningCode2
Language Models with Rationality0
HOP, UNION, GENERATE: Explainable Multi-hop Reasoning without Rationale Supervision0
AVeriTeC: A Dataset for Real-world Claim Verification with Evidence from the WebCode1
FACTIFY3M: A Benchmark for Multimodal Fact Verification with Explainability through 5W Question-Answering0
REFinD: Relation Extraction Financial DatasetCode0
How Language Model Hallucinations Can SnowballCode1
A Comprehensive Survey of Sentence Representations: From the BERT Epoch to the ChatGPT Era and Beyond0
Knowledge-Retrieval Task-Oriented Dialog Systems with Semi-SupervisionCode0
Evaluating Prompt-based Question Answering for Object Prediction in the Open Research Knowledge GraphCode0
Teaching Probabilistic Logical Reasoning to TransformersCode0
MultiTabQA: Generating Tabular Answers for Multi-Table Question AnsweringCode1
Beneath Surface Similarity: Large Language Models Make Reasonable Scientific Analogies after Structure AbductionCode0
LLMs for Knowledge Graph Construction and Reasoning: Recent Capabilities and Future OpportunitiesCode2
VLAB: Enhancing Video Language Pre-training by Feature Adapting and Blending0
Evaluating Open-QA EvaluationCode1
Pruning Pre-trained Language Models with Principled Importance and Self-regularizationCode0
Enhancing Few-shot Text-to-SQL Capabilities of Large Language Models: A Study on Prompt Design Strategies0
TheoremQA: A Theorem-driven Question Answering datasetCode1
Model Analysis & Evaluation for Ambiguous Question AnsweringCode0
Continually Improving Extractive QA via Human FeedbackCode0
Target-Aware Spatio-Temporal Reasoning via Answering Questions in Dynamics Audio-Visual ScenariosCode0
VNHSGE: VietNamese High School Graduation Examination Dataset for Large Language ModelsCode1
What Makes for Good Visual Tokenizers for Large Language Models?Code1
Pengi: An Audio Language Model for Audio TasksCode2
Show:102550
← PrevPage 89 of 217Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1IE-Net (ensemble)EM90.94Unverified
2FPNet (ensemble)EM90.87Unverified
3IE-NetV2 (ensemble)EM90.86Unverified
4SA-Net on Albert (ensemble)EM90.72Unverified
5SA-Net-V2 (ensemble)EM90.68Unverified
6FPNet (ensemble)EM90.6Unverified
7Retro-Reader (ensemble)EM90.58Unverified
8EntitySpanFocusV2 (ensemble)EM90.52Unverified
9TransNets + SFVerifier + SFEnsembler (ensemble)EM90.49Unverified
10EntitySpanFocus+AT (ensemble)EM90.45Unverified