| Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language Models as Agents | Feb 18, 2024 | Mathematical ReasoningMulti-hop Question Answering | CodeCode Available | 1 |
| Living in the Moment: Can Large Language Models Grasp Co-Temporal Reasoning? | Jun 13, 2024 | Mathematical ReasoningQuestion Answering | CodeCode Available | 1 |
| REAL-Prover: Retrieval Augmented Lean Prover for Mathematical Reasoning | May 27, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| From Good to Great: Improving Math Reasoning with Tool-Augmented Interleaf Prompting | Dec 18, 2023 | DiversityGSM8K | —Unverified | 0 |
| From Correctness to Comprehension: AI Agents for Personalized Error Diagnosis in Education | Feb 19, 2025 | DiagnosticGSM8K | —Unverified | 0 |
| From Calculation to Adjudication: Examining LLM judges on Mathematical Reasoning Tasks | Sep 6, 2024 | Machine TranslationMathematical Reasoning | —Unverified | 0 |
| A Symbolic Framework for Evaluating Mathematical Reasoning and Generalisation with Transformers | May 21, 2023 | Mathematical Reasoning | —Unverified | 0 |
| Keep Guessing? When Considering Inference Scaling, Mind the Baselines | Oct 20, 2024 | Mathematical Reasoning | —Unverified | 0 |
| Fourier Circuits in Neural Networks and Transformers: A Case Study of Modular Arithmetic with Multiple Inputs | Feb 12, 2024 | 2kMathematical Reasoning | —Unverified | 0 |
| Formal Mathematical Reasoning: A New Frontier in AI | Dec 20, 2024 | Automated Theorem ProvingMath | —Unverified | 0 |