| Better Process Supervision with Bi-directional Rewarding Signals | Mar 6, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| SOLAR: Scalable Optimization of Large-scale Architecture for Reasoning | Mar 6, 2025 | GSM8KMath | —Unverified | 0 |
| START: Self-taught Reasoner with Tools | Mar 6, 2025 | MathSelf-Learning | —Unverified | 0 |
| Performance Comparison of Large Language Models on Advanced Calculus Problems | Mar 5, 2025 | MathMathematical Problem-Solving | —Unverified | 0 |
| LEWIS (LayEr WIse Sparsity) -- A Training Free Guided Model Merging Approach | Mar 5, 2025 | Instruction FollowingMath | —Unverified | 0 |
| FANS -- Formal Answer Selection for Natural Language Math Reasoning Using Lean4 | Mar 5, 2025 | Answer SelectionMath | —Unverified | 0 |
| Self-Evolved Preference Optimization for Enhancing Mathematical Reasoning in Small Language Models | Mar 4, 2025 | GSM8KMath | —Unverified | 0 |
| PromptCoT: Synthesizing Olympiad-level Problems for Mathematical Reasoning in Large Language Models | Mar 4, 2025 | GSM8KMath | CodeCode Available | 1 |
| What's Behind PPO's Collapse in Long-CoT? Value Optimization Holds the Secret | Mar 3, 2025 | MathReinforcement Learning (RL) | —Unverified | 0 |
| When an LLM is apprehensive about its answers -- and when its uncertainty is justified | Mar 3, 2025 | MathMMLU | CodeCode Available | 0 |