SOTAVerified

Question Answering

Question answering can be segmented into domain-specific tasks like community question answering and knowledge-base question answering. Popular benchmark datasets for evaluation question answering systems include SQuAD, HotPotQA, bAbI, TriviaQA, WikiQA, and many others. Models for question answering are typically evaluated on metrics like EM and F1. Some recent top performing models are T5 and XLNet.

( Image credit: SQuAD )

Papers

Showing 30513075 of 10817 papers

TitleStatusHype
DynFocus: Dynamic Cooperative Network Empowers LLMs with Video Understanding0
DynRank: Improving Passage Retrieval with Dynamic Zero-Shot Prompting Based on Question Classification0
DynRsl-VLM: Enhancing Autonomous Driving Perception with Dynamic Resolution Vision-Language Models0
Biomedical Large Languages Models Seem not to be Superior to Generalist Models on Unseen Medical Data0
An Overview Of Temporal Commonsense Reasoning and Acquisition0
Dynamic Stochastic Decoding Strategy for Open-Domain Dialogue Generation0
Biomedical Document Retrieval for Clinical Decision Support System0
EACO: Enhancing Alignment in Multimodal LLMs via Critical Observation0
A Gaze-grounded Visual Question Answering Dataset for Clarifying Ambiguous Japanese Questions0
Biomedical Question Answering: A Survey of Approaches and Challenges0
EarthDial: Turning Multi-sensory Earth Observations to Interactive Dialogues0
EarthSE: A Benchmark Evaluating Earth Scientific Exploration Capability for Large Language Models0
Dynamic Relevance Graph Network for Knowledge-Aware Question Answering0
Biomedical Question Answering via Weighted Neural Network Passage Retrieval0
Easy Questions First? A Case Study on Curriculum Learning for Question Answering0
Biomedical/Clinical NLP0
Dynamic Q&A of Clinical Documents with Large Language Models0
Dynamic Neural Turing Machine with Soft and Hard Addressing Schemes0
BioMedGPT: Open Multimodal Generative Pre-trained Transformer for BioMedicine0
EBMs vs. CL: Exploring Self-Supervised Visual Pretraining for Visual Question Answering0
Evaluating the Performance and Robustness of LLMs in Materials Science Q&A and Property Predictions0
Evaluating the Robustness of Machine Reading Comprehension Models to Low Resource Entity Renaming0
Evaluating Zero-Shot GPT-4V Performance on 3D Visual Question Answering Benchmarks0
Dynamic Multistep Reasoning based on Video Scene Graph for Video Question Answering0
DynamicMind: A Tri-Mode Thinking System for Large Language Models0
Show:102550
← PrevPage 123 of 433Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1IE-Net (ensemble)EM90.94Unverified
2FPNet (ensemble)EM90.87Unverified
3IE-NetV2 (ensemble)EM90.86Unverified
4SA-Net on Albert (ensemble)EM90.72Unverified
5SA-Net-V2 (ensemble)EM90.68Unverified
6FPNet (ensemble)EM90.6Unverified
7Retro-Reader (ensemble)EM90.58Unverified
8EntitySpanFocusV2 (ensemble)EM90.52Unverified
9TransNets + SFVerifier + SFEnsembler (ensemble)EM90.49Unverified
10EntitySpanFocus+AT (ensemble)EM90.45Unverified