SOTAVerified

Open-Domain Question Answering

Open-domain question answering is the task of question answering on open-domain datasets such as Wikipedia.

Papers

Showing 51100 of 494 papers

TitleStatusHype
Entropy-Based Decoding for Retrieval-Augmented Large Language ModelsCode0
QPaug: Question and Passage Augmentation for Open-Domain Question Answering of LLMsCode0
Towards Robust Evaluation: A Comprehensive Taxonomy of Datasets and Metrics for Open Domain Question Answering in the Era of Large Language Models0
RE-RAG: Improving Open-Domain QA Performance and Interpretability with Relevance Estimator in Retrieval-Augmented GenerationCode0
CaLM: Contrasting Large and Small Language Models to Verify Grounded Generation0
SPAGHETTI: Open-Domain Question Answering from Heterogeneous Data Sources with Retrieval and Semantic Parsing0
Passage-specific Prompt Tuning for Passage Reranking in Question Answering with Large Language ModelsCode0
Unraveling and Mitigating Retriever Inconsistencies in Retrieval-Augmented Large Language ModelsCode0
Conv-CoA: Improving Open-domain Question Answering in Large Language Models via Conversational Chain-of-Action0
Accurate and Nuanced Open-QA Evaluation Through Textual EntailmentCode0
AGRaME: Any-Granularity Ranking with Multi-Vector Embeddings0
Large Language Models Can Self-Correct with Key Condition Verification0
TANQ: An open domain dataset of table answered questionsCode1
Improving Long Text Understanding with Knowledge Distilled from Summarization Model0
Compressing Long Context for Enhancing RAG with AMR-based Concept Distillation0
Stochastic RAG: End-to-End Retrieval-Augmented Generation through Expected Utility Maximization0
Enhancing Contextual Understanding in Large Language Models through Contrastive DecodingCode1
Semi-Parametric Retrieval via Binary Bag-of-Tokens IndexCode0
When to Retrieve: Teaching LLMs to Utilize Information Retrieval EffectivelyCode0
Towards a Search Engine for Machines: Unified Ranking for Multiple Retrieval-Augmented Large Language ModelsCode0
Spiral of Silence: How is Large Language Model Killing Information Retrieval? -- A Case Study on Open Domain Question AnsweringCode1
Is Table Retrieval a Solved Problem? Exploring Join-Aware Multi-Table Retrieval0
KazQAD: Kazakh Open-Domain Question Answering DatasetCode0
Multi-Granularity Guided Fusion-in-DecoderCode1
Improving Retrieval Augmented Open-Domain Question-Answering with Vectorized ContextsCode0
Towards Better Generalization in Open-Domain Question Answering by Mitigating Context Memorization0
Denoising Table-Text Retrieval for Open-Domain Question AnsweringCode0
ArabicaQA: A Comprehensive Dataset for Arabic Question AnsweringCode1
Blended RAG: Improving RAG (Retriever-Augmented Generation) Accuracy with Semantic Search and Hybrid Query-Based RetrieversCode2
Awakening Augmented Generation: Learning to Awaken Internal Knowledge of Large Language Models for Question AnsweringCode0
FIT-RAG: Black-Box RAG with Factual Information and Token Reduction0
Context Quality Matters in Training Fusion-in-Decoder for Extractive Open-Domain Question Answering0
DESIRE-ME: Domain-Enhanced Supervised Information REtrieval using Mixture-of-ExpertsCode0
Beyond Memorization: The Challenge of Random Memory Access in Language ModelsCode1
Harnessing Multi-Role Capabilities of Large Language Models for Open-Domain Question AnsweringCode1
To Generate or to Retrieve? On the Effectiveness of Artificial Contexts for Medical Open-Domain Question AnsweringCode1
Answerability in Retrieval-Augmented Open-Domain Question Answering0
Automatic Question-Answer Generation for Long-Tail Knowledge0
Reasoning in Conversation: Solving Subjective Tasks through Dialogue Simulation for Large Language Models0
REAR: A Relevance-Aware Retrieval-Augmented Framework for Open-Domain Question AnsweringCode1
Pre-training Cross-lingual Open Domain Question Answering with Large-scale Synthetic SupervisionCode0
RetrievalQA: Assessing Adaptive Retrieval-Augmented Generation for Short-form Open-Domain Question AnsweringCode2
Self-DC: When to Reason and When to Act? Self Divide-and-Conquer for Compositional Unknown Questions0
PEDANTS: Cheap but Effective and Interpretable Answer EquivalenceCode2
BlendFilter: Advancing Retrieval-Augmented Large Language Models via Query Generation Blending and Knowledge Filtering0
MURRE: Multi-Hop Table Retrieval with Removal for Open-Domain Text-to-SQLCode0
A Dataset of Open-Domain Question Answering with Multiple-Span Answers0
ITINERA: Integrating Spatial Optimization with Large Language Models for Open-domain Urban Itinerary PlanningCode2
VerAs: Verify then Assess STEM Lab ReportsCode0
A Chain-of-Thought Is as Strong as Its Weakest Link: A Benchmark for Verifiers of Reasoning Chains0
Show:102550
← PrevPage 2 of 10Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1somebodyKILT-RL2.62Unverified
2WikipediaKILT-RL2.46Unverified
3arxiv.org/abs/2103.06332KILT-RL2.36Unverified
4BART + DPRKILT-RL1.9Unverified
5RAGKILT-RL1.69Unverified
6T5-baseKILT-RL0Unverified
7GENREKILT-RL0Unverified
8Multi-task DPRKILT-RL0Unverified
9BARTKILT-RL0Unverified
10Training Set Retrieval (top 1)KILT-RL0Unverified
#ModelMetricClaimedVerifiedStatus
1Re2GKILT-EM43.56Unverified
2intersectKILT-EM38.78Unverified
3KGI_0KILT-EM36.36Unverified
4WikipediaKILT-EM35.32Unverified
5RAGKILT-EM32.69Unverified
6BERT + DPRKILT-EM31.99Unverified
7BART + DPRKILT-EM30.06Unverified
8Multitask DPR + BARTKILT-EM29.09Unverified
9SphereKILT-EM0Unverified
10T5-baseKILT-EM0Unverified
#ModelMetricClaimedVerifiedStatus
1Re2GKILT-EM57.91Unverified
2intersectKILT-EM50.56Unverified
3WikipediaKILT-EM45.55Unverified
4KGI_0KILT-EM42.85Unverified
5Multitask DPR + BARTKILT-EM42.36Unverified
6RAGKILT-EM38.13Unverified
7BERT + DPRKILT-EM34.48Unverified
8BART + DPRKILT-EM31.4Unverified
9Multi-task DPRKILT-EM0Unverified
10SphereKILT-EM0Unverified
#ModelMetricClaimedVerifiedStatus
1intersectKILT-EM18.06Unverified
2WikipediaKILT-EM11.71Unverified
3Multitask DPR + BARTKILT-EM9.53Unverified
4RAGKILT-EM3.21Unverified
5BART + DPRKILT-EM1.96Unverified
6BERT + DPRKILT-EM0.74Unverified
7SphereKILT-EM0Unverified
8Multi-task DPRKILT-EM0Unverified
9GENREKILT-EM0Unverified
10chriskueiKILT-EM0Unverified
#ModelMetricClaimedVerifiedStatus
1SpanBERTF184.8Unverified
2Cluster-Former (#C=512)EM68Unverified
3Locality-Sensitive HashingEM66Unverified
4Multi-passage BERTEM65.1Unverified
5Sparse AttentionEM64.7Unverified
6DECAPROPEM62.2Unverified
7Bi-Attention + DCU-LSTMN-gram F159.5Unverified
8Denoising QAEM58.8Unverified
9DecaPropEM56.8Unverified
10AMANDAN-gram F156.6Unverified
#ModelMetricClaimedVerifiedStatus
1Fourier TransformerRouge-L26.9Unverified
2QGRouge-L26.4Unverified
3BARTRouge-L24.3Unverified
4E-MCARouge-L24Unverified
5Transformer Multitask + LayerDropRouge-L23.4Unverified
6Multi-InrerleaveRouge-L14.63Unverified
#ModelMetricClaimedVerifiedStatus
1Evidence Aggregation via R^3 Re-RankingEM (Quasar-T)42.3Unverified
2Denoising QAEM (Quasar-T)42.2Unverified
3DecaPropEM (Quasar-T)38.6Unverified
4R^3EM (Quasar-T)35.3Unverified
5GAEM (Quasar-T)26.4Unverified
6BiDAFEM (Quasar-T)25.9Unverified
#ModelMetricClaimedVerifiedStatus
1FiEExact Match58.4Unverified
2R2-D2 HN-DPRExact Match55.9Unverified
3UniK-QAExact Match54.9Unverified
4UnitedQA (Hybrid)Exact Match54.7Unverified
5BPR (linear scan; l=1000)Exact Match41.6Unverified
#ModelMetricClaimedVerifiedStatus
1SPARTAEM59.3Unverified
2Blended RAGEM57.63Unverified
3BERTseriniEM50.2Unverified
4BERTseriniEM38.6Unverified
#ModelMetricClaimedVerifiedStatus
1UniK-QAExact Match57.7Unverified
2FiE+PAQExact Match56.3Unverified
3FiEExact Match52.4Unverified
4EMDR2Exact Match48.7Unverified
#ModelMetricClaimedVerifiedStatus
1DrQAEM70Unverified
2DCNEM66.2Unverified
3MPCMEM65.5Unverified
#ModelMetricClaimedVerifiedStatus
1ERNIE 2.0 LargeEM64.2Unverified
2ERNIE 2.0 BaseEM61.3Unverified
#ModelMetricClaimedVerifiedStatus
1UniK-QAExact Match65.5Unverified
2BPR (linear scan; l=1000)Exact Match56.8Unverified
#ModelMetricClaimedVerifiedStatus
1EMDR2Exact Match52.5Unverified
#ModelMetricClaimedVerifiedStatus
1UnitedQA (Hybrid)Exact Match70.5Unverified