| Halu-J: Critique-Based Hallucination Judge | Jul 17, 2024 | Evidence SelectionHallucination | CodeCode Available | 4 | 5 |
| Chain-of-Discussion: A Multi-Model Framework for Complex Evidence-Based Question Answering | Feb 26, 2024 | Evidence SelectionOpen-Ended Question Answering | CodeCode Available | 4 | 5 |
| Benchmarking Retrieval-Augmented Multimomal Generation for Document Question Answering | May 22, 2025 | BenchmarkingEvidence Selection | CodeCode Available | 1 | 5 |
| A Dataset of Information-Seeking Questions and Answers Anchored in Research Papers | May 7, 2021 | Evidence SelectionQuestion Answering | CodeCode Available | 1 | 5 |
| Are Large Language Models Really Good Logical Reasoners? A Comprehensive Evaluation and Beyond | Jun 16, 2023 | BenchmarkingEvidence Selection | CodeCode Available | 1 | 5 |
| AmbiFC: Fact-Checking Ambiguous Claims with Evidence | Apr 1, 2021 | Claim VerificationEvidence Selection | CodeCode Available | 1 | 5 |
| Evidence Selection as a Token-Level Prediction Task | Nov 1, 2021 | Claim VerificationEvidence Selection | CodeCode Available | 1 | 5 |
| SlideVQA: A Dataset for Document Visual Question Answering on Multiple Images | Jan 12, 2023 | Evidence SelectionQuestion Answering | CodeCode Available | 1 | 5 |
| Unsupervised Alignment-based Iterative Evidence Retrieval for Multi-hop Question Answering | May 4, 2020 | Evidence SelectionMulti-hop Question Answering | CodeCode Available | 1 | 5 |
| MeLU: Meta-Learned User Preference Estimator for Cold-Start Recommendation | Jul 31, 2019 | Evidence SelectionMeta-Learning | CodeCode Available | 0 | 5 |
| Capturing Global Structural Information in Long Document Question Answering with Compressive Graph Selector Network | Oct 11, 2022 | Evidence SelectionGraph Attention | CodeCode Available | 0 | 5 |
| In-the-Wild Video Question Answering | Oct 1, 2022 | Evidence SelectionQuestion Answering | —Unverified | 0 | 0 |
| Knowledge-Aware Iterative Retrieval for Multi-Agent Systems | Mar 17, 2025 | Evidence SelectionLarge Language Model | —Unverified | 0 | 0 |
| Product Answer Generation from Heterogeneous Sources: A New Benchmark and Best Practices | Jan 16, 2022 | Answer GenerationData Augmentation | —Unverified | 0 | 0 |
| Product Answer Generation from Heterogeneous Sources: A New Benchmark and Best Practices | May 1, 2022 | Answer GenerationData Augmentation | —Unverified | 0 | 0 |
| SARA: Selective and Adaptive Retrieval-augmented Generation with Context Compression | Jul 8, 2025 | Evidence SelectionRAG | —Unverified | 0 | 0 |
| Comparing Knowledge Sources for Open-Domain Scientific Claim Verification | Feb 5, 2024 | Claim VerificationEvidence Selection | —Unverified | 0 | 0 |
| Complementary Evidence Identification in Open-Domain Question Answering | Mar 22, 2021 | DiversityEvidence Selection | —Unverified | 0 | 0 |
| Do We Need Language-Specific Fact-Checking Models? The Case of Chinese | Jan 27, 2024 | Evidence SelectionFact Checking | —Unverified | 0 | 0 |
| SemEval-2023 Task 7: Multi-Evidence Natural Language Inference for Clinical Trial Data | May 4, 2023 | Evidence SelectionNatural Language Inference | —Unverified | 0 | 0 |
| Enhancing Large Language Models with Domain-specific Retrieval Augment Generation: A Case Study on Long-form Consumer Health Question Answering in Ophthalmology | Sep 20, 2024 | Evidence SelectionForm | —Unverified | 0 | 0 |
| WildQA: In-the-Wild Video Question Answering | Sep 14, 2022 | Evidence SelectionQuestion Answering | —Unverified | 0 | 0 |