| Halu-J: Critique-Based Hallucination Judge | Jul 17, 2024 | Evidence SelectionHallucination | CodeCode Available | 4 |
| Chain-of-Discussion: A Multi-Model Framework for Complex Evidence-Based Question Answering | Feb 26, 2024 | Evidence SelectionOpen-Ended Question Answering | CodeCode Available | 4 |
| Unsupervised Alignment-based Iterative Evidence Retrieval for Multi-hop Question Answering | May 4, 2020 | Evidence SelectionMulti-hop Question Answering | CodeCode Available | 1 |
| AmbiFC: Fact-Checking Ambiguous Claims with Evidence | Apr 1, 2021 | Claim VerificationEvidence Selection | CodeCode Available | 1 |
| Benchmarking Retrieval-Augmented Multimomal Generation for Document Question Answering | May 22, 2025 | BenchmarkingEvidence Selection | CodeCode Available | 1 |
| SlideVQA: A Dataset for Document Visual Question Answering on Multiple Images | Jan 12, 2023 | Evidence SelectionQuestion Answering | CodeCode Available | 1 |
| A Dataset of Information-Seeking Questions and Answers Anchored in Research Papers | May 7, 2021 | Evidence SelectionQuestion Answering | CodeCode Available | 1 |
| Are Large Language Models Really Good Logical Reasoners? A Comprehensive Evaluation and Beyond | Jun 16, 2023 | BenchmarkingEvidence Selection | CodeCode Available | 1 |
| Evidence Selection as a Token-Level Prediction Task | Nov 1, 2021 | Claim VerificationEvidence Selection | CodeCode Available | 1 |
| Complementary Evidence Identification in Open-Domain Question Answering | Mar 22, 2021 | DiversityEvidence Selection | —Unverified | 0 |