| S^2FT: Efficient, Scalable and Generalizable LLM Fine-tuning by Structured Sparsity | Dec 9, 2024 | Arithmetic Reasoning | —Unverified | 0 | 0 |
| Self-Evaluation Guided Beam Search for Reasoning | May 1, 2023 | Arithmetic ReasoningGSM8K | —Unverified | 0 | 0 |
| Hint of Thought prompting: an explainable and zero-shot approach to reasoning tasks with LLMs | May 19, 2023 | Arithmetic ReasoningGSM8K | —Unverified | 0 | 0 |
| SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training | Jan 28, 2025 | Arithmetic ReasoningMemorization | —Unverified | 0 | 0 |
| Skin-in-the-Game: Decision Making via Multi-Stakeholder Alignment in LLMs | May 21, 2024 | Arithmetic ReasoningDecision Making | —Unverified | 0 | 0 |
| Small Language Models are Equation Reasoners | Sep 19, 2024 | Arithmetic ReasoningKnowledge Distillation | —Unverified | 0 | 0 |
| Solving math word problems with process- and outcome-based feedback | Nov 25, 2022 | Arithmetic ReasoningGSM8K | —Unverified | 0 | 0 |
| SymBa: Symbolic Backward Chaining for Structured Natural Language Reasoning | Feb 20, 2024 | Arithmetic ReasoningGSM8K | —Unverified | 0 | 0 |
| The ART of LLM Refinement: Ask, Refine, and Trust | Nov 14, 2023 | Arithmetic ReasoningGSM8K | —Unverified | 0 | 0 |
| The Claude 3 Model Family: Opus, Sonnet, Haiku | Mar 4, 2024 | 1 Image, 2*2 StitchingArithmetic Reasoning | —Unverified | 0 | 0 |