| Multi-Agent Sampling: Scaling Inference Compute for Data Synthesis with Tree Search-Based Agentic Collaboration | Dec 22, 2024 | Decision MakingMachine Translation | CodeCode Available | 0 |
| MoD: A Distribution-Based Approach for Merging Large Language Models | Nov 1, 2024 | Mathematical Reasoning | CodeCode Available | 0 |
| MMATH: A Multilingual Benchmark for Mathematical Reasoning | May 25, 2025 | MathMathematical Reasoning | CodeCode Available | 0 |
| Transformers discover an elementary calculation system exploiting local attention and grid-like problem representation | Jul 6, 2022 | Mathematical Reasoning | CodeCode Available | 0 |
| MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO | May 19, 2025 | DecoderImage Generation | CodeCode Available | 0 |
| EquivPruner: Boosting Efficiency and Quality in LLM-Based Search via Action Pruning | May 22, 2025 | GSM8KMath | CodeCode Available | 0 |
| Techniques to Improve Neural Math Word Problem Solvers | Feb 6, 2023 | DecoderLanguage Modelling | CodeCode Available | 0 |
| CER: Confidence Enhanced Reasoning in LLMs | Feb 20, 2025 | MathMathematical Reasoning | CodeCode Available | 0 |
| Compositional Generalization with Tree Stack Memory Units | Nov 5, 2019 | Mathematical ReasoningZero-shot Generalization | CodeCode Available | 0 |
| Rethinking Fine-Tuning when Scaling Test-Time Compute: Limiting Confidence Improves Mathematical Reasoning | Feb 11, 2025 | Code GenerationMath | CodeCode Available | 0 |