| Forgotten Polygons: Multimodal Large Language Models are Shape-Blind | Feb 21, 2025 | MathMathematical Problem-Solving | CodeCode Available | 1 |
| S*: Test Time Scaling for Code Generation | Feb 20, 2025 | Code GenerationMath | CodeCode Available | 7 |
| How to Get Your LLM to Generate Challenging Problems for Evaluation | Feb 20, 2025 | Code CompletionMath | CodeCode Available | 1 |
| Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning | Feb 20, 2025 | Mathreinforcement-learning | CodeCode Available | 7 |
| GATE: Graph-based Adaptive Tool Evolution Across Diverse Tasks | Feb 20, 2025 | Code GenerationMath | CodeCode Available | 0 |
| CER: Confidence Enhanced Reasoning in LLMs | Feb 20, 2025 | MathMathematical Reasoning | CodeCode Available | 0 |
| Earlier Tokens Contribute More: Learning Direct Preference Optimization From Temporal Decay Perspective | Feb 20, 2025 | GSM8KMath | CodeCode Available | 0 |
| A Survey on Feedback-based Multi-step Reasoning for Large Language Models on Mathematics | Feb 20, 2025 | Math | —Unverified | 0 |
| SIFT: Grounding LLM Reasoning in Contexts via Stickers | Feb 19, 2025 | GSM8KMath | CodeCode Available | 2 |
| BeamLoRA: Beam-Constraint Low-Rank Adaptation | Feb 19, 2025 | Code GenerationMath | —Unverified | 0 |