| DrawEduMath: Evaluating Vision Language Models with Expert-Annotated Students' Hand-Drawn Math Images | Jan 24, 2025 | Math | —Unverified | 0 |
| Leveraging Online Olympiad-Level Math Problems for LLMs Training and Contamination-Resistant Evaluation | Jan 24, 2025 | Math | CodeCode Available | 1 |
| Advancing Mathematical Reasoning in Language Models: The Impact of Problem-Solving Data, Data Synthesis Methods, and Training Stages | Jan 23, 2025 | Instruction FollowingMath | —Unverified | 0 |
| Kimi k1.5: Scaling Reinforcement Learning with LLMs | Jan 22, 2025 | Mathreinforcement-learning | CodeCode Available | 7 |
| Pairwise RM: Perform Best-of-N Sampling with Knockout Tournament | Jan 22, 2025 | Math | CodeCode Available | 1 |
| An Optimal Transport approach to arbitrage correction: Application to volatility Stress-Tests | Jan 21, 2025 | Math | —Unverified | 0 |
| Is your LLM trapped in a Mental Set? Investigative study on how mental sets affect the reasoning capabilities of LLMs | Jan 21, 2025 | GSM8KIn-Context Learning | —Unverified | 0 |
| RedStar: Does Scaling Long-CoT Data Unlock Better Slow-Reasoning Systems? | Jan 20, 2025 | MathReinforcement Learning (RL) | —Unverified | 0 |
| Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling | Jan 20, 2025 | Imitation LearningLanguage Modeling | CodeCode Available | 2 |
| Control LLM: Controlled Evolution for Intelligence Retention in LLM | Jan 19, 2025 | MathMathematical Reasoning | CodeCode Available | 1 |