| Layer-Aware Task Arithmetic: Disentangling Task-Specific and Instruction-Following Knowledge | Feb 27, 2025 | GSM8KHumanEval | —Unverified | 0 | 0 |
| Large Language Models Can Self-Improve | Oct 20, 2022 | Arithmetic ReasoningCommon Sense Reasoning | —Unverified | 0 | 0 |
| LiteSearch: Efficacious Tree Search for LLM | Jun 29, 2024 | GSM8KMathematical Reasoning | —Unverified | 0 | 0 |
| LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models | May 25, 2025 | GSM8KHumanEval | —Unverified | 0 | 0 |
| LLaMa-SciQ: An Educational Chatbot for Answering Science MCQ | Sep 25, 2024 | ChatbotGSM8K | —Unverified | 0 | 0 |
| Training Large Language Models to Reason via EM Policy Gradient | Apr 24, 2025 | GSM8KMath | —Unverified | 0 | 0 |
| Large Language Models as Analogical Reasoners | Oct 3, 2023 | Code GenerationGSM8K | —Unverified | 0 | 0 |
| KwaiYiiMath: Technical Report | Oct 11, 2023 | Arithmetic ReasoningGSM8K | —Unverified | 0 | 0 |
| Kwai-STaR: Transform LLMs into State-Transition Reasoners | Nov 7, 2024 | GSM8KMathematical Problem-Solving | —Unverified | 0 | 0 |
| Meaning-Typed Programming: Language Abstraction and Runtime for Model-Integrated Applications | May 14, 2024 | GSM8KMath | —Unverified | 0 | 0 |