Question Answering

Question answering can be segmented into domain-specific tasks like community question answering and knowledge-base question answering. Popular benchmark datasets for evaluation question answering systems include SQuAD, HotPotQA, bAbI, TriviaQA, WikiQA, and many others. Models for question answering are typically evaluated on metrics like EM and F1. Some recent top performing models are T5 and XLNet.

( Image credit: SQuAD )

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 351–400 of 10817 papers

Title	Date	Tasks	Status	Hype
LOVA3: Learning to Visual Question Answering, Asking and Assessment	May 23, 2024	Question AnsweringVisual Question Answering	CodeCode Available	2
AGILE: A Novel Reinforcement Learning Framework of LLM Agents	May 23, 2024	Question Answeringreinforcement-learning	CodeCode Available	2
Automated Evaluation of Retrieval-Augmented Language Models with Task-Specific Exam Generation	May 22, 2024	InformativenessLanguage Modeling	CodeCode Available	2
ProtT3: Protein-to-Text Generation for Text-based Protein Understanding	May 21, 2024	Property PredictionQuestion Answering	CodeCode Available	2
MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering	May 20, 2024	BenchmarkingQuestion Answering	CodeCode Available	2
Grounded 3D-LLM with Referent Tokens	May 16, 2024	Dense CaptioningDiversity	CodeCode Available	2
FreeVA: Offline MLLM as Training-Free Video Assistant	May 13, 2024	FairnessQuestion Answering	CodeCode Available	2
HMT: Hierarchical Memory Transformer for Long Context Language Processing	May 9, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
DALK: Dynamic Co-Augmentation of LLMs and KG to answer Alzheimer's Disease Questions with Scientific Literature	May 8, 2024	Question Answering	CodeCode Available	2
Overview of the EHRSQL 2024 Shared Task on Reliable Text-to-SQL Modeling on Electronic Health Records	May 4, 2024	Information RetrievalQuestion Answering	CodeCode Available	2
IndicGenBench: A Multilingual Benchmark to Evaluate Generation Capabilities of LLMs on Indic Languages	Apr 25, 2024	Cross-Lingual Question AnsweringDiversity	CodeCode Available	2
Generate-on-Graph: Treat LLM as both Agent and KG in Incomplete Knowledge Graph Question Answering	Apr 23, 2024	Graph Question AnsweringHallucination	CodeCode Available	2
GSCo: Towards Generalizable AI in Medicine via Generalist-Specialist Collaboration	Apr 23, 2024	Collaborative InferenceIn-Context Learning	CodeCode Available	2
FakeBench: Probing Explainable Fake Image Detection via Large Multimodal Models	Apr 20, 2024	Binary ClassificationFake Image Detection	CodeCode Available	2
Med-MoE: Mixture of Domain-Specific Experts for Lightweight Medical Vision-Language Models	Apr 16, 2024	image-classificationImage Classification	CodeCode Available	2
LLoCO: Learning Long Contexts Offline	Apr 11, 2024	4kIn-Context Learning	CodeCode Available	2
Superposition Prompting: Improving and Accelerating Retrieval-Augmented Generation	Apr 10, 2024	Question AnsweringRAG	CodeCode Available	2
Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks	Apr 9, 2024	Answer SelectionLong-Context Understanding	CodeCode Available	2
LongVLM: Efficient Long Video Understanding via Large Language Models	Apr 4, 2024	Question AnsweringVideo Question Answering	CodeCode Available	2
Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward	Apr 1, 2024	Instruction FollowingLanguage Modeling	CodeCode Available	2
How Much are Large Language Models Contaminated? A Comprehensive Survey and the LLMSanitize Library	Mar 31, 2024	Question Answering	CodeCode Available	2
Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want	Mar 29, 2024	Instruction FollowingLanguage Modelling	CodeCode Available	2
VHM: Versatile and Honest Vision Language Model for Remote Sensing Image Analysis	Mar 29, 2024	HallucinationImage Captioning	CodeCode Available	2
Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models	Mar 29, 2024	Question AnsweringVisual Question Answering	CodeCode Available	2
Multi-Frame, Lightweight & Efficient Vision-Language Models for Question Answering in Autonomous Driving	Mar 28, 2024	Autonomous DrivingLanguage Modeling	CodeCode Available	2
Can Language Beat Numerical Regression? Language-Based Multimodal Trajectory Prediction	Mar 27, 2024	Image CaptioningLanguage Modeling	CodeCode Available	2
An Image Grid Can Be Worth a Video: Zero-shot Video Question Answering Using a VLM	Mar 27, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
OmniVid: A Generative Framework for Universal Video Understanding	Mar 26, 2024	Action RecognitionDecoder	CodeCode Available	2
Visually Guided Generative Text-Layout Pre-training for Document Intelligence	Mar 25, 2024	Document Classificationdocument understanding	CodeCode Available	2
LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models	Mar 22, 2024	Language ModellingLarge Language Model	CodeCode Available	2
Blended RAG: Improving RAG (Retriever-Augmented Generation) Accuracy with Semantic Search and Hybrid Query-Based Retrievers	Mar 22, 2024	Information Retrieval	CodeCode Available	2
VL-ICL Bench: The Devil in the Details of Multimodal In-Context Learning	Mar 19, 2024	BenchmarkingImage Captioning	CodeCode Available	2
RAGGED: Towards Informed Design of Retrieval Augmented Generation Systems	Mar 14, 2024	DecoderQuestion Answering	CodeCode Available	2
Beyond Text: Frozen Large Language Models in Visual Signal Comprehension	Mar 12, 2024	DeblurringDecoder	CodeCode Available	2
ERA-CoT: Improving Chain-of-Thought through Entity Relationship Analysis	Mar 11, 2024	Question Answering	CodeCode Available	2
KG-Rank: Enhancing Large Language Models for Medical QA with Knowledge Graphs and Ranking Techniques	Mar 9, 2024	Knowledge GraphsLong Form Question Answering	CodeCode Available	2
Debiasing Multimodal Large Language Models	Mar 8, 2024	FairnessQuestion Answering	CodeCode Available	2
CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios	Mar 7, 2024	Audio-visual Question AnsweringAudio-Visual Question Answering (AVQA)	CodeCode Available	2
QAQ: Quality Adaptive Quantization for LLM KV Cache	Mar 7, 2024	QuantizationQuestion Answering	CodeCode Available	2
Are Language Models Puzzle Prodigies? Algorithmic Puzzles Unveil Serious Challenges in Multimodal Reasoning	Mar 6, 2024	Multimodal ReasoningQuestion Answering	CodeCode Available	2
Unsupervised Information Refinement Training of Large Language Models for Retrieval-Augmented Generation	Feb 28, 2024	Code GenerationIn-Context Learning	CodeCode Available	2
The First Place Solution of WSDM Cup 2024: Leveraging Large Language Models for Conversational Multi-Doc QA	Feb 28, 2024	Natural Language UnderstandingQuestion Answering	CodeCode Available	2
Large Language Models(LLMs) on Tabular Data: Prediction, Generation, and Understanding -- A Survey	Feb 27, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
BlendSQL: A Scalable Dialect for Unifying Hybrid Question Answering in Relational Algebra	Feb 27, 2024	Question Answering	CodeCode Available	2
TruthX: Alleviating Hallucinations by Editing Large Language Models in Truthful Space	Feb 27, 2024	Contrastive LearningHallucination	CodeCode Available	2
RetrievalQA: Assessing Adaptive Retrieval-Augmented Generation for Short-form Open-Domain Question Answering	Feb 26, 2024	FormOpen-Domain Question Answering	CodeCode Available	2
Data Science with LLMs and Interpretable Models	Feb 22, 2024	Additive modelsQuestion Answering	CodeCode Available	2
ActiveRAG: Autonomously Knowledge Assimilation and Accommodation through Retrieval-Augmented Agents	Feb 21, 2024	Active LearningPosition	CodeCode Available	2
FanOutQA: A Multi-Hop, Multi-Document Question Answering Benchmark for Large Language Models	Feb 21, 2024	Question Answering	CodeCode Available	2
Small Models, Big Insights: Leveraging Slim Proxy Models To Decide When and What to Retrieve for LLMs	Feb 19, 2024	Question Answering	CodeCode Available	2

Show:10 25 50

← PrevPage 8 of 217Next →

All datasets SQuAD2.0 SQuAD1.1 HotpotQA PIQA BoolQ COPA TriviaQA SQuAD1.1 dev Natural Questions OpenBookQA TruthfulQA MultiRC

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	IE-Net (ensemble)	EM	90.94	—	Unverified
2	FPNet (ensemble)	EM	90.87	—	Unverified
3	IE-NetV2 (ensemble)	EM	90.86	—	Unverified
4	SA-Net on Albert (ensemble)	EM	90.72	—	Unverified
5	SA-Net-V2 (ensemble)	EM	90.68	—	Unverified
6	FPNet (ensemble)	EM	90.6	—	Unverified
7	Retro-Reader (ensemble)	EM	90.58	—	Unverified
8	EntitySpanFocusV2 (ensemble)	EM	90.52	—	Unverified
9	TransNets + SFVerifier + SFEnsembler (ensemble)	EM	90.49	—	Unverified
10	EntitySpanFocus+AT (ensemble)	EM	90.45	—	Unverified