| Memorizing Transformers | Mar 16, 2022 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Accelerating Sparse Deep Neural Networks | Apr 16, 2021 | GPUMath | CodeCode Available | 2 |
| Full Page Handwriting Recognition via Image to Sequence Extraction | Mar 11, 2021 | Handwriting RecognitionHandwritten Text Recognition | CodeCode Available | 2 |
| Measuring Mathematical Problem Solving With the MATH Dataset | Mar 5, 2021 | MathMathematical Problem-Solving | CodeCode Available | 2 |
| Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination | Jul 14, 2025 | MathMathematical Reasoning | CodeCode Available | 1 |
| A Practical Two-Stage Recipe for Mathematical LLMs: Maximizing Accuracy with SFT and Efficiency with Reinforcement Learning | Jul 11, 2025 | MathMathematical Reasoning | CodeCode Available | 1 |
| The Delta Learning Hypothesis: Preference Tuning on Weak Data can Yield Strong Gains | Jul 8, 2025 | MathMMLU | CodeCode Available | 1 |
| LLMThinkBench: Towards Basic Math Reasoning and Overthinking in Large Language Models | Jul 5, 2025 | BenchmarkingGPU | CodeCode Available | 1 |
| Evolving Prompts In-Context: An Open-ended, Self-replicating Perspective | Jun 22, 2025 | In-Context LearningLarge Language Model | CodeCode Available | 1 |
| OJBench: A Competition Level Code Benchmark For Large Language Models | Jun 19, 2025 | Math | CodeCode Available | 1 |