| ThoughtProbe: Classifier-Guided Thought Space Exploration Leveraging LLM Intrinsic Reasoning | Apr 9, 2025 | Arithmetic Reasoningvalid | —Unverified | 0 |
| TinyGSM: achieving >80% on GSM8k with small language models | Dec 14, 2023 | Arithmetic ReasoningGSM8K | —Unverified | 0 |
| Tokenization Constraints in LLMs: A Study of Symbolic and Arithmetic Reasoning Limits | May 20, 2025 | Arithmetic Reasoning | —Unverified | 0 |
| Towards Intrinsic Self-Correction Enhancement in Monte Carlo Tree Search Boosted Reasoning via Iterative Preference Learning | Dec 23, 2024 | Arithmetic ReasoningGSM8K | —Unverified | 0 |
| Transcending Scaling Laws with 0.1% Extra Compute | Oct 20, 2022 | Arithmetic ReasoningCross-Lingual Question Answering | —Unverified | 0 |
| Unlocking Structured Thinking in Language Models with Cognitive Prompting | Oct 3, 2024 | Arithmetic ReasoningGSM8K | —Unverified | 0 |
| VisualSphinx: Large-Scale Synthetic Vision Logic Puzzles for RL | May 29, 2025 | Arithmetic ReasoningImage Generation | —Unverified | 0 |
| When do you need Chain-of-Thought Prompting for ChatGPT? | Apr 6, 2023 | Arithmetic ReasoningMemorization | —Unverified | 0 |
| Why Vision Language Models Struggle with Visual Arithmetic? Towards Enhanced Chart and Geometry Understanding | Feb 17, 2025 | Arithmetic ReasoningChart Understanding | —Unverified | 0 |
| Why We Build Local Large Language Models: An Observational Analysis from 35 Japanese and Multilingual LLMs | Dec 19, 2024 | Arithmetic ReasoningCode Generation | —Unverified | 0 |