| Let's Verify Step by Step | May 31, 2023 | Active LearningMath | CodeCode Available | 4 |
| Reasoning with Language Model is Planning with World Model | May 24, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 4 |
| Galactica: A Large Language Model for Science | Nov 16, 2022 | AnachronismsBias Detection | CodeCode Available | 4 |
| Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models | Jun 9, 2022 | Common Sense ReasoningMath | CodeCode Available | 4 |
| Dive into Deep Learning | Jun 21, 2021 | Deep LearningMath | CodeCode Available | 4 |
| Spurious Rewards: Rethinking Training Signals in RLVR | Jun 12, 2025 | MathMathematical Reasoning | CodeCode Available | 3 |
| MathArena: Evaluating LLMs on Uncontaminated Math Competitions | May 29, 2025 | MathMathematical Reasoning | CodeCode Available | 3 |
| General-Reasoner: Advancing LLM Reasoning Across All Domains | May 20, 2025 | AllMath | CodeCode Available | 3 |
| Thinkless: LLM Learns When to Think | May 19, 2025 | GSM8KMath | CodeCode Available | 3 |
| MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning | May 15, 2025 | cross-modal alignmentGeometry Problem Solving | CodeCode Available | 3 |