| Energy-Based Transformers are Scalable Learners and Thinkers | Jul 2, 2025 | DenoisingImage Denoising | CodeCode Available | 4 |
| SuperCorrect: Supervising and Correcting Language Models with Error-Driven Insights | Oct 11, 2024 | GSM8KMath | CodeCode Available | 4 |
| Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond | Mar 13, 2025 | Domain GeneralizationMath | CodeCode Available | 4 |
| Lean Workbook: A large-scale Lean problem set formalized from natural language math problems | Jun 6, 2024 | Automated Theorem ProvingMath | CodeCode Available | 4 |
| Let's Verify Step by Step | May 31, 2023 | Active LearningMath | CodeCode Available | 4 |
| Dive into Deep Learning | Jun 21, 2021 | Deep LearningMath | CodeCode Available | 4 |
| ReFT: Reasoning with Reinforced Fine-Tuning | Jan 17, 2024 | GSM8KMath | CodeCode Available | 4 |
| InternLM2.5-StepProver: Advancing Automated Theorem Proving via Expert Iteration on Large-Scale LEAN Problems | Oct 21, 2024 | Automated Theorem ProvingCPU | CodeCode Available | 4 |
| InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning | Feb 9, 2024 | Data AugmentationGSM8K | CodeCode Available | 4 |
| LLaMA Pro: Progressive LLaMA with Block Expansion | Jan 4, 2024 | Instruction FollowingMath | CodeCode Available | 4 |
| CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction | Feb 11, 2025 | Code GenerationMath | CodeCode Available | 4 |
| How is ChatGPT's behavior changing over time? | Jul 18, 2023 | Code GenerationLanguage Modelling | CodeCode Available | 4 |
| MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level Supervision | May 19, 2025 | MathMathematical Reasoning | CodeCode Available | 4 |
| Skywork Open Reasoner 1 Technical Report | May 28, 2025 | MathReinforcement Learning (RL) | CodeCode Available | 4 |
| LEAN-GitHub: Compiling GitHub LEAN repositories for a versatile LEAN prover | Jul 24, 2024 | Automated Theorem ProvingMath | CodeCode Available | 4 |
| Goedel-Prover: A Frontier Model for Open-Source Automated Theorem Proving | Feb 11, 2025 | Automated Theorem ProvingLarge Language Model | CodeCode Available | 3 |
| Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling | Feb 10, 2025 | Math | CodeCode Available | 3 |
| General-Reasoner: Advancing LLM Reasoning Across All Domains | May 20, 2025 | AllMath | CodeCode Available | 3 |
| MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities | Aug 1, 2024 | MathMM-Vet | CodeCode Available | 3 |
| Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks | Nov 22, 2022 | Math | CodeCode Available | 3 |
| PAL: Program-aided Language Models | Nov 18, 2022 | Arithmetic ReasoningGSM8K | CodeCode Available | 3 |
| Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning | May 1, 2024 | ARCGSM8K | CodeCode Available | 3 |
| MuMath-Code: Combining Tool-Use Large Language Models with Multi-perspective Data Augmentation for Mathematical Reasoning | May 13, 2024 | Data AugmentationGSM8K | CodeCode Available | 3 |
| Noise Contrastive Alignment of Language Models with Explicit Rewards | Feb 8, 2024 | Language ModellingMath | CodeCode Available | 3 |
| BAdam: A Memory Efficient Full Parameter Optimization Method for Large Language Models | Apr 3, 2024 | GPUMath | CodeCode Available | 3 |