| Causal Head Gating: A Framework for Interpreting Roles of Attention Heads in Transformers | May 19, 2025 | In-Context LearningInstruction Following | —Unverified | 0 |
| AI4Math: A Native Spanish Benchmark for University-Level Mathematical Reasoning in Large Language Models | May 25, 2025 | MathMathematical Reasoning | —Unverified | 0 |
| Let's Reinforce Step by Step | Nov 10, 2023 | GSM8KLogical Reasoning | —Unverified | 0 |
| Exploring Mathematical Extrapolation of Large Language Models with Synthetic Data | Jun 4, 2024 | Mathematical ReasoningText Generation | —Unverified | 0 |
| Can Theoretical Physics Research Benefit from Language Agents? | Jun 6, 2025 | Code GenerationMathematical Reasoning | —Unverified | 0 |
| Assessing the Impact of Prompting Methods on ChatGPT's Mathematical Capabilities | Dec 22, 2023 | ChatbotGSM8K | —Unverified | 0 |
| Explain with Visual Keypoints Like a Real Mentor! A Benchmark for Multimodal Solution Explanation | Apr 4, 2025 | MathMathematical Reasoning | —Unverified | 0 |
| Expediting and Elevating Large Language Model Reasoning via Hidden Chain-of-Thought Decoding | Sep 13, 2024 | Contrastive LearningLanguage Modeling | —Unverified | 0 |
| Can Pruning Improve Reasoning? Revisiting Long-CoT Compression with Capability in Mind for Better Reasoning | May 20, 2025 | Large Language ModelMathematical Reasoning | —Unverified | 0 |
| LemmaHead: RAG Assisted Proof Generation Using Large Language Models | Jan 27, 2025 | Automated Theorem ProvingMathematical Proofs | —Unverified | 0 |
| Expanding Search Space with Diverse Prompting Agents: An Efficient Sampling Approach for LLM Mathematical Reasoning | Oct 13, 2024 | MathMathematical Reasoning | —Unverified | 0 |
| Crossing the Reward Bridge: Expanding RL with Verifiable Rewards Across Diverse Domains | Mar 31, 2025 | Mathematical Reasoningreinforcement-learning | —Unverified | 0 |
| Can LLMs understand Math? -- Exploring the Pitfalls in Mathematical Reasoning | May 21, 2025 | MathMathematical Reasoning | —Unverified | 0 |
| Evolving LLMs' Self-Refinement Capability via Iterative Preference Optimization | Feb 8, 2025 | GSM8KMath | —Unverified | 0 |
| Evolutionary Pre-Prompt Optimization for Mathematical Reasoning | Dec 5, 2024 | Few-Shot LearningGSM8K | —Unverified | 0 |
| Assessing the Emergent Symbolic Reasoning Abilities of Llama Large Language Models | Jun 5, 2024 | Mathematical Reasoning | —Unverified | 0 |
| Let's Reason Formally: Natural-Formal Hybrid Reasoning Enhances LLM's Math Capability | May 29, 2025 | MathMathematical Reasoning | —Unverified | 0 |
| Let's reward step by step: Step-Level reward model as the Navigators for Reasoning | Oct 16, 2023 | Code GenerationGSM8K | —Unverified | 0 |
| Evaluation of OpenAI o1: Opportunities and Challenges of AGI | Sep 27, 2024 | Emotion RecognitionLarge Language Model | —Unverified | 0 |
| Evaluation of LLMs for mathematical problem solving | May 30, 2025 | GSM8KMathematical Problem-Solving | —Unverified | 0 |
| Can Large Language Models Invent Algorithms to Improve Themselves? | Oct 21, 2024 | Mathematical Reasoning | —Unverified | 0 |
| Evaluating the Meta- and Object-Level Reasoning of Large Language Models for Question Answering | Feb 14, 2025 | Mathematical ReasoningObject | —Unverified | 0 |
| Evaluating Robustness of Reward Models for Mathematical Reasoning | Oct 2, 2024 | MathMathematical Reasoning | —Unverified | 0 |
| A Graph-Based Synthetic Data Pipeline for Scaling High-Quality Reasoning Instructions | Dec 12, 2024 | GSM8KKnowledge Graphs | —Unverified | 0 |
| Can Large Language Models Explain Themselves? A Study of LLM-Generated Self-Explanations | Oct 17, 2023 | Mathematical ReasoningSentiment Analysis | —Unverified | 0 |