| Large (Vision) Language Models are Unsupervised In-Context Learners | Apr 3, 2025 | GSM8KIn-Context Learning | CodeCode Available | 1 |
| Kalman Filter Enhanced GRPO for Reinforcement Learning-Based Language Model Reasoning | May 12, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| FinanceMath: Knowledge-Intensive Math Reasoning in Finance Domains | Nov 16, 2023 | MathMath Word Problem Solving | CodeCode Available | 1 |
| JiuZhang: A Chinese Pre-trained Language Model for Mathematical Problem Understanding | Jun 13, 2022 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Efficient Neural Theorem Proving via Fine-grained Proof Structure Analysis | Jan 30, 2025 | Automated Theorem ProvingMath | CodeCode Available | 1 |
| Diversify and Conquer: Diversity-Centric Data Selection with Iterative Refinement | Sep 17, 2024 | Active LearningDiversity | CodeCode Available | 1 |
| Code-Vision: Evaluating Multimodal LLMs Logic Understanding and Code Generation Capabilities | Feb 17, 2025 | Code GenerationHumanEval | CodeCode Available | 1 |
| Escape Sky-high Cost: Early-stopping Self-Consistency for Multi-step Reasoning | Jan 19, 2024 | GSM8KMath | CodeCode Available | 1 |
| Is ChatGPT a Good Teacher Coach? Measuring Zero-Shot Performance For Scoring and Providing Actionable Insights on Classroom Instruction | Jun 5, 2023 | Math | CodeCode Available | 1 |
| Efficient Process Reward Model Training via Active Learning | Apr 14, 2025 | Active LearningMath | CodeCode Available | 1 |
| JiuZhang3.0: Efficiently Improving Mathematical Reasoning by Training Small Data Synthesis Models | May 23, 2024 | Knowledge DistillationMath | CodeCode Available | 1 |
| Non-myopic Generation of Language Models for Reasoning and Planning | Oct 22, 2024 | Computational EfficiencyLanguage Modelling | CodeCode Available | 1 |
| Implicit Chain of Thought Reasoning via Knowledge Distillation | Nov 2, 2023 | Knowledge DistillationMath | CodeCode Available | 1 |
| DocMath-Eval: Evaluating Math Reasoning Capabilities of LLMs in Understanding Long and Specialized Documents | Nov 16, 2023 | Math | CodeCode Available | 1 |
| Improving the Validity of Automatically Generated Feedback via Reinforcement Learning | Mar 2, 2024 | MathMisconceptions | CodeCode Available | 1 |
| How to Get Your LLM to Generate Challenging Problems for Evaluation | Feb 20, 2025 | Code CompletionMath | CodeCode Available | 1 |
| How well do Large Language Models perform in Arithmetic tasks? | Mar 16, 2023 | Math | CodeCode Available | 1 |
| Hierarchical Prompting Taxonomy: A Universal Evaluation Framework for Large Language Models Aligned with Human Cognitive Principles | Jun 18, 2024 | Arithmetic ReasoningCode Generation | CodeCode Available | 1 |
| HARP: A challenging human-annotated math reasoning benchmark | Dec 11, 2024 | Math | CodeCode Available | 1 |
| CLEVR-Math: A Dataset for Compositional Language, Visual and Mathematical Reasoning | Aug 10, 2022 | MathMathematical Reasoning | CodeCode Available | 1 |
| Efficient Reasoning for LLMs through Speculative Chain-of-Thought | Apr 27, 2025 | GSM8KMath | CodeCode Available | 1 |
| Injecting Numerical Reasoning Skills into Language Models | Apr 9, 2020 | Data AugmentationDecoder | CodeCode Available | 1 |
| Graph-to-Tree Neural Networks for Learning Structured Input-Output Translation with Applications to Semantic Parsing and Math Word Problem | Apr 7, 2020 | DecoderMachine Translation | CodeCode Available | 1 |
| Graph-to-Tree Learning for Solving Math Word Problems | Jul 1, 2020 | DecoderMath | CodeCode Available | 1 |
| Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning | Sep 29, 2022 | Logical ReasoningMath | CodeCode Available | 1 |