| GNN-RAG: Graph Neural Retrieval for Large Language Model Reasoning | May 30, 2024 | Graph Question AnsweringKnowledge Graphs | CodeCode Available | 3 |
| LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding | Apr 25, 2024 | GSM8KHellaSwag | CodeCode Available | 3 |
| ST-MoE: Designing Stable and Transferable Sparse Expert Models | Feb 17, 2022 | ARCCommon Sense Reasoning | CodeCode Available | 3 |
| DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving | Jun 18, 2024 | Arithmetic ReasoningMath | CodeCode Available | 2 |
| JADE: A Linguistics-based Safety Evaluation Platform for Large Language Models | Nov 1, 2023 | Natural Questions | CodeCode Available | 2 |
| Atlas: Few-shot Learning with Retrieval Augmented Language Models | Aug 5, 2022 | Fact CheckingFew-Shot Learning | CodeCode Available | 2 |
| QAMPARI: An Open-domain Question Answering Benchmark for Questions with Many Answers from Multiple Paragraphs | May 25, 2022 | Answer GenerationNatural Questions | CodeCode Available | 2 |
| Scaling Language Models: Methods, Analysis & Insights from Training Gopher | Dec 8, 2021 | Abstract AlgebraAnachronisms | CodeCode Available | 2 |
| Relevance-guided Supervision for OpenQA with ColBERT | Jul 1, 2020 | Natural QuestionsOpen-Domain Question Answering | CodeCode Available | 2 |
| Constructing and Evaluating Declarative RAG Pipelines in PyTerrier | Jun 12, 2025 | Natural QuestionsRAG | CodeCode Available | 1 |