| Exploring Group and Symmetry Principles in Large Language Models | Feb 9, 2024 | Arithmetic ReasoningNegation | —Unverified | 0 |
| Fact-Consistency Evaluation of Text-to-SQL Generation for Business Intelligence Using Exaone 3.5 | Apr 30, 2025 | Arithmetic ReasoningText to SQL | —Unverified | 0 |
| Fine-Tuning and Prompt Optimization: Two Great Steps that Work Better Together | Jul 15, 2024 | Arithmetic ReasoningLanguage Modeling | —Unverified | 0 |
| FinLMM-R1: Enhancing Financial Reasoning in LMM through Scalable Data and Reward Design | Jun 16, 2025 | Answer GenerationArithmetic Reasoning | —Unverified | 0 |
| GaLore+: Boosting Low-Rank Adaptation for LLMs with Cross-Head Projection | Dec 15, 2024 | Arithmetic ReasoningText Generation | —Unverified | 0 |
| On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes | Jun 23, 2023 | Arithmetic ReasoningKnowledge Distillation | —Unverified | 0 |
| Hint Marginalization for Improved Reasoning in Large Language Models | Dec 17, 2024 | Arithmetic Reasoning | —Unverified | 0 |
| Inference-Time Computations for LLM Reasoning and Planning: A Benchmark and Insights | Feb 18, 2025 | Arithmetic ReasoningCommon Sense Reasoning | —Unverified | 0 |
| Joint Flashback Adaptation for Forgetting-Resistant Instruction Tuning | May 21, 2025 | Arithmetic ReasoningInstruction Following | —Unverified | 0 |
| KwaiYiiMath: Technical Report | Oct 11, 2023 | Arithmetic ReasoningGSM8K | —Unverified | 0 |
| Large Language Models are Null-Shot Learners | Jan 16, 2024 | Arithmetic ReasoningBenchmarking | —Unverified | 0 |
| Large Language Models Can Self-Correct with Key Condition Verification | May 23, 2024 | Arithmetic ReasoningMath | —Unverified | 0 |
| Large Language Models Can Self-Improve | Oct 20, 2022 | Arithmetic ReasoningCommon Sense Reasoning | —Unverified | 0 |
| Learning-at-Criticality in Large Language Models for Quantum Field Theory and Beyond | Jun 4, 2025 | Arithmetic ReasoningReinforcement Learning (RL) | —Unverified | 0 |
| Model Card and Evaluations for Claude Models | Jul 11, 2023 | Arithmetic ReasoningBug fixing | —Unverified | 0 |
| Neural-Symbolic Recursive Machine for Systematic Generalization | Oct 4, 2022 | Arithmetic ReasoningMachine Translation | —Unverified | 0 |
| NumGLUE: A Suite of Fundamental yet Challenging Mathematical Reasoning Tasks | Apr 12, 2022 | Arithmetic ReasoningMathematical Reasoning | —Unverified | 0 |
| On Representational Dissociation of Language and Arithmetic in Large Language Models | Feb 17, 2025 | Arithmetic Reasoning | —Unverified | 0 |
| Making Large Language Models Better Reasoners with Step-Aware Verifier | Jun 6, 2022 | Arithmetic ReasoningFew-Shot Learning | —Unverified | 0 |
| Orca 2: Teaching Small Language Models How to Reason | Nov 18, 2023 | Arithmetic ReasoningCommon Sense Reasoning | —Unverified | 0 |
| Orca-Math: Unlocking the potential of SLMs in Grade School Math | Feb 16, 2024 | Arithmetic ReasoningGSM8K | —Unverified | 0 |
| PERFT: Parameter-Efficient Routed Fine-Tuning for Mixture-of-Expert Model | Nov 12, 2024 | Arithmetic ReasoningMixture-of-Experts | —Unverified | 0 |
| Prompt Sketching for Large Language Models | Nov 8, 2023 | Arithmetic ReasoningBenchmarking | —Unverified | 0 |
| RCOT: Detecting and Rectifying Factual Inconsistency in Reasoning by Reversing Chain-of-Thought | May 19, 2023 | Arithmetic ReasoningGSM8K | —Unverified | 0 |
| S^2FT: Efficient, Scalable and Generalizable LLM Fine-tuning by Structured Sparsity | Dec 9, 2024 | Arithmetic Reasoning | —Unverified | 0 |
| Self-Evaluation Guided Beam Search for Reasoning | May 1, 2023 | Arithmetic ReasoningGSM8K | —Unverified | 0 |
| Hint of Thought prompting: an explainable and zero-shot approach to reasoning tasks with LLMs | May 19, 2023 | Arithmetic ReasoningGSM8K | —Unverified | 0 |
| SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training | Jan 28, 2025 | Arithmetic ReasoningMemorization | —Unverified | 0 |
| Skin-in-the-Game: Decision Making via Multi-Stakeholder Alignment in LLMs | May 21, 2024 | Arithmetic ReasoningDecision Making | —Unverified | 0 |
| Small Language Models are Equation Reasoners | Sep 19, 2024 | Arithmetic ReasoningKnowledge Distillation | —Unverified | 0 |
| Solving math word problems with process- and outcome-based feedback | Nov 25, 2022 | Arithmetic ReasoningGSM8K | —Unverified | 0 |
| SymBa: Symbolic Backward Chaining for Structured Natural Language Reasoning | Feb 20, 2024 | Arithmetic ReasoningGSM8K | —Unverified | 0 |
| The ART of LLM Refinement: Ask, Refine, and Trust | Nov 14, 2023 | Arithmetic ReasoningGSM8K | —Unverified | 0 |
| The Claude 3 Model Family: Opus, Sonnet, Haiku | Mar 4, 2024 | 1 Image, 2*2 StitchingArithmetic Reasoning | —Unverified | 0 |
| The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve? | Feb 24, 2025 | Arithmetic ReasoningCommon Sense Reasoning | —Unverified | 0 |
| The Unreasonable Effectiveness of Eccentric Automatic Prompts | Feb 9, 2024 | Arithmetic ReasoningGSM8K | —Unverified | 0 |
| Think Beyond Size: Adaptive Prompting for More Effective Reasoning | Oct 10, 2024 | Arithmetic ReasoningComputational Efficiency | —Unverified | 0 |
| Think-to-Talk or Talk-to-Think? When LLMs Come Up with an Answer in Multi-Step Arithmetic Reasoning | Dec 2, 2024 | Arithmetic Reasoning | —Unverified | 0 |
| ThoughtProbe: Classifier-Guided Thought Space Exploration Leveraging LLM Intrinsic Reasoning | Apr 9, 2025 | Arithmetic Reasoningvalid | —Unverified | 0 |
| TinyGSM: achieving >80% on GSM8k with small language models | Dec 14, 2023 | Arithmetic ReasoningGSM8K | —Unverified | 0 |
| Tokenization Constraints in LLMs: A Study of Symbolic and Arithmetic Reasoning Limits | May 20, 2025 | Arithmetic Reasoning | —Unverified | 0 |
| Towards Intrinsic Self-Correction Enhancement in Monte Carlo Tree Search Boosted Reasoning via Iterative Preference Learning | Dec 23, 2024 | Arithmetic ReasoningGSM8K | —Unverified | 0 |
| Transcending Scaling Laws with 0.1% Extra Compute | Oct 20, 2022 | Arithmetic ReasoningCross-Lingual Question Answering | —Unverified | 0 |
| Unlocking Structured Thinking in Language Models with Cognitive Prompting | Oct 3, 2024 | Arithmetic ReasoningGSM8K | —Unverified | 0 |
| VisualSphinx: Large-Scale Synthetic Vision Logic Puzzles for RL | May 29, 2025 | Arithmetic ReasoningImage Generation | —Unverified | 0 |
| When do you need Chain-of-Thought Prompting for ChatGPT? | Apr 6, 2023 | Arithmetic ReasoningMemorization | —Unverified | 0 |
| Why Vision Language Models Struggle with Visual Arithmetic? Towards Enhanced Chart and Geometry Understanding | Feb 17, 2025 | Arithmetic ReasoningChart Understanding | —Unverified | 0 |
| Why We Build Local Large Language Models: An Observational Analysis from 35 Japanese and Multilingual LLMs | Dec 19, 2024 | Arithmetic ReasoningCode Generation | —Unverified | 0 |
| 0/1 Deep Neural Networks via Block Coordinate Descent | Jun 19, 2022 | 10-shot image generation | —Unverified | 0 |
| Relating the Seemingly Unrelated: Principled Understanding of Generalization for Generative Models in Arithmetic Reasoning Tasks | Jul 25, 2024 | Arithmetic Reasoning | —Unverified | 0 |