SOTAVerified

Question Answering

Question answering can be segmented into domain-specific tasks like community question answering and knowledge-base question answering. Popular benchmark datasets for evaluation question answering systems include SQuAD, HotPotQA, bAbI, TriviaQA, WikiQA, and many others. Models for question answering are typically evaluated on metrics like EM and F1. Some recent top performing models are T5 and XLNet.

( Image credit: SQuAD )

Papers

Showing 44264450 of 10817 papers

TitleStatusHype
The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-TuningCode2
Language Models with Rationality0
HOP, UNION, GENERATE: Explainable Multi-hop Reasoning without Rationale Supervision0
AVeriTeC: A Dataset for Real-world Claim Verification with Evidence from the WebCode1
FACTIFY3M: A Benchmark for Multimodal Fact Verification with Explainability through 5W Question-Answering0
REFinD: Relation Extraction Financial DatasetCode0
How Language Model Hallucinations Can SnowballCode1
A Comprehensive Survey of Sentence Representations: From the BERT Epoch to the ChatGPT Era and Beyond0
Knowledge-Retrieval Task-Oriented Dialog Systems with Semi-SupervisionCode0
Evaluating Prompt-based Question Answering for Object Prediction in the Open Research Knowledge GraphCode0
Teaching Probabilistic Logical Reasoning to TransformersCode0
MultiTabQA: Generating Tabular Answers for Multi-Table Question AnsweringCode1
Beneath Surface Similarity: Large Language Models Make Reasonable Scientific Analogies after Structure AbductionCode0
LLMs for Knowledge Graph Construction and Reasoning: Recent Capabilities and Future OpportunitiesCode2
VLAB: Enhancing Video Language Pre-training by Feature Adapting and Blending0
Evaluating Open-QA EvaluationCode1
Pruning Pre-trained Language Models with Principled Importance and Self-regularizationCode0
Enhancing Few-shot Text-to-SQL Capabilities of Large Language Models: A Study on Prompt Design Strategies0
TheoremQA: A Theorem-driven Question Answering datasetCode1
Model Analysis & Evaluation for Ambiguous Question AnsweringCode0
Continually Improving Extractive QA via Human FeedbackCode0
Target-Aware Spatio-Temporal Reasoning via Answering Questions in Dynamics Audio-Visual ScenariosCode0
VNHSGE: VietNamese High School Graduation Examination Dataset for Large Language ModelsCode1
What Makes for Good Visual Tokenizers for Large Language Models?Code1
Pengi: An Audio Language Model for Audio TasksCode2
Show:102550
← PrevPage 178 of 433Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1IE-Net (ensemble)EM90.94Unverified
2FPNet (ensemble)EM90.87Unverified
3IE-NetV2 (ensemble)EM90.86Unverified
4SA-Net on Albert (ensemble)EM90.72Unverified
5SA-Net-V2 (ensemble)EM90.68Unverified
6FPNet (ensemble)EM90.6Unverified
7Retro-Reader (ensemble)EM90.58Unverified
8EntitySpanFocusV2 (ensemble)EM90.52Unverified
9TransNets + SFVerifier + SFEnsembler (ensemble)EM90.49Unverified
10EntitySpanFocus+AT (ensemble)EM90.45Unverified