Question Answering

Question answering can be segmented into domain-specific tasks like community question answering and knowledge-base question answering. Popular benchmark datasets for evaluation question answering systems include SQuAD, HotPotQA, bAbI, TriviaQA, WikiQA, and many others. Models for question answering are typically evaluated on metrics like EM and F1. Some recent top performing models are T5 and XLNet.

( Image credit: SQuAD )

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 426–450 of 10817 papers

Title	Date	Tasks	Status	Hype
Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks	Jan 5, 2024	Arithmetic ReasoningCode Generation	CodeCode Available	2
PDF-WuKong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling	Oct 8, 2024	document understandingLanguage Modeling	CodeCode Available	2
FrameFusion: Combining Similarity and Importance for Video Token Reduction on Large Visual Language Models	Dec 30, 2024	Question AnsweringToken Reduction	CodeCode Available	2
Pengi: An Audio Language Model for Audio Tasks	May 19, 2023	Audio captioningAudio Question Answering	CodeCode Available	2
Perception Test: A Diagnostic Benchmark for Multimodal Models	Oct 19, 2022	DiagnosticMultiple-choice	CodeCode Available	2
Ask Me Anything: A simple strategy for prompting language models	Oct 5, 2022	Coreference ResolutionNatural Language Inference	CodeCode Available	2
FakeBench: Probing Explainable Fake Image Detection via Large Multimodal Models	Apr 20, 2024	Binary ClassificationFake Image Detection	CodeCode Available	2
FanOutQA: A Multi-Hop, Multi-Document Question Answering Benchmark for Large Language Models	Feb 21, 2024	Question Answering	CodeCode Available	2
Compressing Context to Enhance Inference Efficiency of Large Language Models	Oct 9, 2023	ArticlesQuestion Answering	CodeCode Available	2
ActiveRAG: Autonomously Knowledge Assimilation and Accommodation through Retrieval-Augmented Agents	Feb 21, 2024	Active LearningPosition	CodeCode Available	2
AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction Simulator	Feb 15, 2024	BenchmarkingDiagnostic	CodeCode Available	2
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer	Oct 23, 2019	Answer GenerationCommon Sense Reasoning	CodeCode Available	2
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations	Sep 26, 2019	Common Sense ReasoningGPU	CodeCode Available	2
EyeCLIP: A visual-language foundation model for multi-modal ophthalmic image analysis	Sep 10, 2024	Contrastive LearningCross-Modal Retrieval	CodeCode Available	2
Atlas: Few-shot Learning with Retrieval Augmented Language Models	Aug 5, 2022	Fact CheckingFew-Shot Learning	CodeCode Available	2
ProtT3: Protein-to-Text Generation for Text-based Protein Understanding	May 21, 2024	Property PredictionQuestion Answering	CodeCode Available	2
Evaluating RAG-Fusion with RAGElo: an Automated Elo-based Framework	Jun 20, 2024	HallucinationQuestion Answering	CodeCode Available	2
A Simple Aerial Detection Baseline of Multimodal Language Models	Jan 16, 2025	object-detectionObject Detection	CodeCode Available	2
QDrop: Randomly Dropping Quantization for Extremely Low-bit Post-Training Quantization	Mar 11, 2022	image-classificationImage Classification	CodeCode Available	2
AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model	Aug 2, 2022	Causal Language ModelingCommon Sense Reasoning	CodeCode Available	2
Evaluating LLM Reasoning in the Operations Research Domain with ORQA	Dec 22, 2024	Question Answering	CodeCode Available	2
Explore the Limits of Omni-modal Pretraining at Scale	Jun 13, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
FreeVA: Offline MLLM as Training-Free Video Assistant	May 13, 2024	FairnessQuestion Answering	CodeCode Available	2
ERA-CoT: Improving Chain-of-Thought through Entity Relationship Analysis	Mar 11, 2024	Question Answering	CodeCode Available	2
A Replication Study of Dense Passage Retriever	Apr 12, 2021	Open-Domain Question AnsweringQuestion Answering	CodeCode Available	2

Show:10 25 50

← PrevPage 18 of 433Next →

All datasets SQuAD2.0 SQuAD1.1 HotpotQA PIQA BoolQ COPA TriviaQA SQuAD1.1 dev Natural Questions OpenBookQA TruthfulQA MultiRC

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	IE-Net (ensemble)	EM	90.94	—	Unverified
2	FPNet (ensemble)	EM	90.87	—	Unverified
3	IE-NetV2 (ensemble)	EM	90.86	—	Unverified
4	SA-Net on Albert (ensemble)	EM	90.72	—	Unverified
5	SA-Net-V2 (ensemble)	EM	90.68	—	Unverified
6	FPNet (ensemble)	EM	90.6	—	Unverified
7	Retro-Reader (ensemble)	EM	90.58	—	Unverified
8	EntitySpanFocusV2 (ensemble)	EM	90.52	—	Unverified
9	TransNets + SFVerifier + SFEnsembler (ensemble)	EM	90.49	—	Unverified
10	EntitySpanFocus+AT (ensemble)	EM	90.45	—	Unverified