| Exploring LLM Reasoning Through Controlled Prompt Variations | Apr 2, 2025 | GSM8KMathematical Problem-Solving | CodeCode Available | 0 | 5 |
| Exploring Equation as a Better Intermediate Meaning Representation for Numerical Reasoning | Aug 21, 2023 | GSM8K | CodeCode Available | 0 | 5 |
| AgentGroupChat-V2: Divide-and-Conquer Is What LLM-Based Multi-Agent System Need | Jun 18, 2025 | GSM8KHumanEval | CodeCode Available | 0 | 5 |
| Scheherazade: Evaluating Chain-of-Thought Math Reasoning in LLMs with Chain-of-Problems | Sep 30, 2024 | GSM8KMath | CodeCode Available | 0 | 5 |
| Re-Initialization Token Learning for Tool-Augmented Large Language Models | Jun 17, 2025 | GSM8KQuestion Answering | CodeCode Available | 0 | 5 |
| SBI-RAG: Enhancing Math Word Problem Solving for Students through Schema-Based Instruction and Retrieval-Augmented Generation | Oct 17, 2024 | GSM8KLanguage Modeling | CodeCode Available | 0 | 5 |
| Activation Steering for Chain-of-Thought Compression | Jul 7, 2025 | GSM8KMath | CodeCode Available | 0 | 5 |
| Reasoning Under 1 Billion: Memory-Augmented Reinforcement Learning for Large Language Models | Apr 3, 2025 | GSM8KReinforcement Learning (RL) | CodeCode Available | 0 | 5 |
| EquivPruner: Boosting Efficiency and Quality in LLM-Based Search via Action Pruning | May 22, 2025 | GSM8KMath | CodeCode Available | 0 | 5 |
| CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation | Feb 28, 2025 | GSM8K | CodeCode Available | 0 | 5 |