| HalluMix: A Task-Agnostic, Multi-Domain Benchmark for Real-World Hallucination Detection | May 1, 2025 | Extractive Question-AnsweringHallucination | —Unverified | 0 |
| When is dataset cartography ineffective? Using training dynamics does not improve robustness against Adversarial SQuAD | Mar 24, 2025 | Adversarial RobustnessExtractive Question-Answering | —Unverified | 0 |
| On Mechanistic Circuits for Extractive Question-Answering | Feb 12, 2025 | Extractive Question-AnsweringLanguage Modeling | —Unverified | 0 |
| FoQA: A Faroese Question-Answering Dataset | Feb 11, 2025 | ArticlesExtractive Question-Answering | —Unverified | 0 |
| AmaSQuAD: A Benchmark for Amharic Extractive Question Answering | Feb 4, 2025 | Extractive Question-AnsweringQuestion Answering | —Unverified | 0 |
| Passage Segmentation of Documents for Extractive Question Answering | Jan 17, 2025 | ChunkingExtractive Question-Answering | —Unverified | 0 |
| SynFinTabs: A Dataset of Synthetic Financial Tables for Information and Table Extraction | Dec 5, 2024 | ArticlesDataset Generation | CodeCode Available | 0 |
| Learning-to-Defer for Extractive Question Answering | Oct 21, 2024 | Computational EfficiencyDecision Making | —Unverified | 0 |
| Towards Robust Extractive Question Answering Models: Rethinking the Training Methodology | Sep 29, 2024 | Extractive Question-AnsweringQuestion Answering | CodeCode Available | 0 |
| Exploring Language Model Generalization in Low-Resource Extractive QA | Sep 27, 2024 | Domain GeneralizationExtractive Question-Answering | CodeCode Available | 0 |