SOTAVerified

Question Answering

Question answering can be segmented into domain-specific tasks like community question answering and knowledge-base question answering. Popular benchmark datasets for evaluation question answering systems include SQuAD, HotPotQA, bAbI, TriviaQA, WikiQA, and many others. Models for question answering are typically evaluated on metrics like EM and F1. Some recent top performing models are T5 and XLNet.

( Image credit: SQuAD )

Papers

Showing 1017610200 of 10817 papers

TitleStatusHype
On Curriculum Learning for Commonsense ReasoningCode0
Effective Approaches to Batch Parallelization for Dynamic Neural Network ArchitecturesCode0
Mamba Fusion: Learning Actions Through QuestioningCode0
Rotational Unit of MemoryCode0
MaMMUT: A Simple Architecture for Joint Learning for MultiModal TasksCode0
Rematch: Robust and Efficient Matching of Local Knowledge Graphs to Improve Structural and Semantic SimilarityCode0
EEE-QA: Exploring Effective and Efficient Question-Answer RepresentationsCode0
MANGO: A Benchmark for Evaluating Mapping and Navigation Abilities of Large Language ModelsCode0
CERET: Cost-Effective Extrinsic Refinement for Text GenerationCode0
Edu-Values: Towards Evaluating the Chinese Education Values of Large Language ModelsCode0
EconLogicQA: A Question-Answering Benchmark for Evaluating Large Language Models in Economic Sequential ReasoningCode0
Prompt-based Zero-shot Relation Extraction with Semantic Knowledge AugmentationCode0
EaSe: A Diagnostic Tool for VQA based on Answer DiversityCode0
MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation ModelsCode0
CAVE: Correcting Attribute Values in E-commerce ProfilesCode0
Causal Question Answering with Reinforcement LearningCode0
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge GraphsCode0
DyREx: Dynamic Query Representation for Extractive Question AnsweringCode0
Dynamic Task and Weight Prioritization Curriculum Learning for Multimodal ImageryCode0
Mapping distributional to model-theoretic semantic spaces: a baselineCode0
Attention-Based Bidirectional Long Short-Term Memory Networks for Relation ClassificationCode0
CausalQA: A Benchmark for Causal Question AnsweringCode0
Causal Graphs Meet Thoughts: Enhancing Complex Reasoning in Graph-Augmented LLMsCode0
Dynamic Memory Networks for Visual and Textual Question AnsweringCode0
One-shot Learning for Question-Answering in Gaokao History ChallengeCode0
Show:102550
← PrevPage 408 of 433Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1IE-Net (ensemble)EM90.94Unverified
2FPNet (ensemble)EM90.87Unverified
3IE-NetV2 (ensemble)EM90.86Unverified
4SA-Net on Albert (ensemble)EM90.72Unverified
5SA-Net-V2 (ensemble)EM90.68Unverified
6FPNet (ensemble)EM90.6Unverified
7Retro-Reader (ensemble)EM90.58Unverified
8EntitySpanFocusV2 (ensemble)EM90.52Unverified
9TransNets + SFVerifier + SFEnsembler (ensemble)EM90.49Unverified
10EntitySpanFocus+AT (ensemble)EM90.45Unverified