SOTAVerified

Question Answering

Question answering can be segmented into domain-specific tasks like community question answering and knowledge-base question answering. Popular benchmark datasets for evaluation question answering systems include SQuAD, HotPotQA, bAbI, TriviaQA, WikiQA, and many others. Models for question answering are typically evaluated on metrics like EM and F1. Some recent top performing models are T5 and XLNet.

( Image credit: SQuAD )

Papers

Showing 20212030 of 10817 papers

TitleStatusHype
Large Language Model Driven Recommendation0
GS-KGC: A Generative Subgraph-based Framework for Knowledge Graph Completion with Large Language Models0
QUITO-X: A New Perspective on Context Compression from the Information Bottleneck Theory0
Putting People in LLMs' Shoes: Generating Better Answers via Question RewriterCode0
Multilingual Non-Factoid Question Answering with Answer Paragraph SelectionCode0
V-RoAst: Visual Road Assessment. Can VLM be a Road Safety Assessor Using the iRAP Standard?Code1
Multilingual Needle in a Haystack: Investigating Long-Context Behavior of Multilingual Large Language ModelsCode0
Ranking Generated Answers: On the Agreement of Retrieval Models with Humans on Consumer Health QuestionsCode0
PA-LLaVA: A Large Language-Vision Assistant for Human Pathology Image UnderstandingCode2
How Susceptible are LLMs to Influence in Prompts?0
Show:102550
← PrevPage 203 of 1082Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1IE-Net (ensemble)EM90.94Unverified
2FPNet (ensemble)EM90.87Unverified
3IE-NetV2 (ensemble)EM90.86Unverified
4SA-Net on Albert (ensemble)EM90.72Unverified
5SA-Net-V2 (ensemble)EM90.68Unverified
6FPNet (ensemble)EM90.6Unverified
7Retro-Reader (ensemble)EM90.58Unverified
8EntitySpanFocusV2 (ensemble)EM90.52Unverified
9TransNets + SFVerifier + SFEnsembler (ensemble)EM90.49Unverified
10EntitySpanFocus+AT (ensemble)EM90.45Unverified