| Training Compute-Optimal Large Language Models | Mar 29, 2022 | AnachronismsAnalogical Similarity | CodeCode Available | 6 | 5 |
| Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers | Aug 12, 2024 | GSM8KMath | CodeCode Available | 4 | 5 |
| PaLM: Scaling Language Modeling with Pathways | Apr 5, 2022 | Auto DebuggingCode Generation | CodeCode Available | 2 | 5 |
| Scaling Language Models: Methods, Analysis & Insights from Training Gopher | Dec 8, 2021 | Abstract AlgebraAnachronisms | CodeCode Available | 2 | 5 |
| Distillation Contrastive Decoding: Improving LLMs Reasoning with Contrastive Decoding and Distillation | Feb 21, 2024 | Arithmetic ReasoningGSM8K | CodeCode Available | 1 | 5 |
| CR-LT-KGQA: A Knowledge Graph Question Answering Dataset Requiring Commonsense Reasoning and Long-Tail Knowledge | Mar 3, 2024 | Claim VerificationGraph Question Answering | CodeCode Available | 1 | 5 |
| Self-Consistency Improves Chain of Thought Reasoning in Language Models | Mar 21, 2022 | ARCArithmetic Reasoning | CodeCode Available | 1 | 5 |
| AutoReason: Automatic Few-Shot Reasoning Decomposition | Dec 9, 2024 | StrategyQA | CodeCode Available | 1 | 5 |
| Escape Sky-high Cost: Early-stopping Self-Consistency for Multi-step Reasoning | Jan 19, 2024 | GSM8KMath | CodeCode Available | 1 | 5 |
| Improving Planning with Large Language Models: A Modular Agentic Architecture | Sep 30, 2023 | In-Context LearningReinforcement Learning (RL) | CodeCode Available | 1 | 5 |
| Knowledge-Augmented Reasoning Distillation for Small Language Models in Knowledge-Intensive Tasks | May 28, 2023 | MedQAMemorization | CodeCode Available | 1 | 5 |
| Unchosen Experts Can Contribute Too: Unleashing MoE Models' Power by Self-Contrast | May 23, 2024 | Computational EfficiencyGSM8K | CodeCode Available | 1 | 5 |
| Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies | Jan 6, 2021 | Question AnsweringStrategyQA | CodeCode Available | 1 | 5 |
| Visconde: Multi-document QA with GPT-3 and Neural Reranking | Dec 19, 2022 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| Voting or Consensus? Decision-Making in Multi-Agent Debate | Feb 26, 2025 | Decision MakingMMLU | CodeCode Available | 0 | 5 |
| Distilling Reasoning Capabilities into Smaller Language Models | Dec 1, 2022 | GSM8KKnowledge Distillation | CodeCode Available | 0 | 5 |
| Rationale-Aware Answer Verification by Pairwise Self-Evaluation | Oct 7, 2024 | ARCStrategyQA | CodeCode Available | 0 | 5 |
| DeLTa: A Decoding Strategy based on Logit Trajectory Prediction Improves Factuality and Reasoning Ability | Mar 4, 2025 | GSM8KLogical Reasoning | CodeCode Available | 0 | 5 |
| Tailoring Self-Rationalizers with Multi-Reward Distillation | Nov 6, 2023 | DiversityQuestion Answering | CodeCode Available | 0 | 5 |
| Teaching Smaller Language Models To Generalise To Unseen Compositional Questions | Aug 2, 2023 | ARCInformation Retrieval | CodeCode Available | 0 | 5 |
| Meta-prompting Optimized Retrieval-augmented Generation | Jul 4, 2024 | Multi-hop Question AnsweringQuestion Answering | —Unverified | 0 | 0 |
| A Looming Replication Crisis in Evaluating Behavior in Language Models? Evidence and Solutions | Sep 30, 2024 | Prompt EngineeringStrategyQA | —Unverified | 0 | 0 |
| Answering Unseen Questions With Smaller Language Models Using Rationale Generation and Dense Retrieval | Aug 9, 2023 | ARCLanguage Modelling | —Unverified | 0 | 0 |
| Better Retrieval May Not Lead to Better Question Answering | May 7, 2022 | Information RetrievalOpen-Domain Question Answering | —Unverified | 0 | 0 |
| Deduction under Perturbed Evidence: Probing Student Simulation Capabilities of Large Language Models | May 23, 2023 | Logical ReasoningStrategyQA | —Unverified | 0 | 0 |
| Dialectical Behavior Therapy Approach to LLM Prompting | Oct 10, 2024 | GSM8KStrategyQA | —Unverified | 0 | 0 |
| Fusing Bidirectional Chains of Thought and Reward Mechanisms A Method for Enhancing Question-Answering Capabilities of Large Language Models for Chinese Intangible Cultural Heritage | May 13, 2025 | Knowledge DistillationLarge Language Model | —Unverified | 0 | 0 |
| IAG: Induction-Augmented Generation Framework for Answering Reasoning Questions | Nov 30, 2023 | Knowledge DistillationRAG | —Unverified | 0 | 0 |
| Improving Attributed Text Generation of Large Language Models via Preference Learning | Mar 27, 2024 | MisinformationRetrieval | —Unverified | 0 | 0 |
| Large Language Models Are Also Good Prototypical Commonsense Reasoners | Sep 22, 2023 | StrategyQA | —Unverified | 0 | 0 |
| Learning to Decompose: Hypothetical Question Decomposition Based on Comparable Texts | Oct 30, 2022 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Advancing Process Verification for Large Language Models via Tree-Based Preference Learning | Jun 29, 2024 | Binary ClassificationGSM8K | —Unverified | 0 | 0 |
| Proof of Thought : Neurosymbolic Program Synthesis allows Robust and Interpretable Reasoning | Sep 25, 2024 | BenchmarkingFormal Logic | —Unverified | 0 | 0 |
| Question-Analysis Prompting Improves LLM Performance in Reasoning Tasks | Jul 4, 2024 | GSM8KStrategyQA | —Unverified | 0 | 0 |
| Rule-Guided Feedback: Enhancing Reasoning by Enforcing Rule Adherence in Large Language Models | Mar 14, 2025 | Checkmate In OneGSM8K | —Unverified | 0 | 0 |
| Self-Evaluation Guided Beam Search for Reasoning | May 1, 2023 | Arithmetic ReasoningGSM8K | —Unverified | 0 | 0 |
| Hint of Thought prompting: an explainable and zero-shot approach to reasoning tasks with LLMs | May 19, 2023 | Arithmetic ReasoningGSM8K | —Unverified | 0 | 0 |
| The ART of LLM Refinement: Ask, Refine, and Trust | Nov 14, 2023 | Arithmetic ReasoningGSM8K | —Unverified | 0 | 0 |
| Towards Uncertainty-Aware Language Agent | Jan 25, 2024 | MMLUStrategyQA | —Unverified | 0 | 0 |
| Unraveling Indirect In-Context Learning Using Influence Functions | Jan 1, 2025 | In-Context LearningInformativeness | —Unverified | 0 | 0 |