| A Graph-Based Synthetic Data Pipeline for Scaling High-Quality Reasoning Instructions | Dec 12, 2024 | GSM8KKnowledge Graphs | —Unverified | 0 |
| Can Large Language Models Explain Themselves? A Study of LLM-Generated Self-Explanations | Oct 17, 2023 | Mathematical ReasoningSentiment Analysis | —Unverified | 0 |
| Assessing Robustness to Spurious Correlations in Post-Training Language Models | May 9, 2025 | Instruction FollowingMathematical Reasoning | —Unverified | 0 |
| Evaluating Large Vision-and-Language Models on Children's Mathematical Olympiads | Jun 22, 2024 | Mathematical Reasoning | —Unverified | 0 |
| Can Language Models Rival Mathematics Students? Evaluating Mathematical Reasoning through Textual Manipulation and Human Experiments | Dec 16, 2024 | Mathematical Reasoning | —Unverified | 0 |
| Let's Reason Formally: Natural-Formal Hybrid Reasoning Enhances LLM's Math Capability | May 29, 2025 | MathMathematical Reasoning | —Unverified | 0 |
| Let's Reinforce Step by Step | Nov 10, 2023 | GSM8KLogical Reasoning | —Unverified | 0 |
| Evaluating Grounded Reasoning by Code-Assisted Large Language Models for Mathematics | Apr 24, 2025 | Code GenerationMath | —Unverified | 0 |
| A Comprehensive Evaluation of Large Language Models on Temporal Event Forecasting | Jul 16, 2024 | Mathematical ReasoningQuestion Answering | —Unverified | 0 |
| ErrorRadar: Benchmarking Complex Mathematical Reasoning of Multimodal Large Language Models Via Error Detection | Oct 6, 2024 | BenchmarkingMathematical Reasoning | —Unverified | 0 |