| Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization | Oct 11, 2024 | GSM8KLanguage Modeling | CodeCode Available | 2 | 5 |
| LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters | May 27, 2024 | BenchmarkingGSM8K | CodeCode Available | 2 | 5 |
| CoT-Valve: Length-Compressible Chain-of-Thought Tuning | Feb 13, 2025 | GSM8K | CodeCode Available | 2 | 5 |
| Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process | Jul 29, 2024 | GSM8KMath | CodeCode Available | 2 | 5 |
| Preference Optimization for Reasoning with Pseudo Feedback | Nov 25, 2024 | GSM8KMath | CodeCode Available | 2 | 5 |
| Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space | May 19, 2025 | GSM8KMath | CodeCode Available | 2 | 5 |
| Learning Math Reasoning from Self-Sampled Correct and Partially-Correct Solutions | May 28, 2022 | Arithmetic ReasoningEfficient Exploration | CodeCode Available | 1 | 5 |
| Learning Goal-Conditioned Representations for Language Reward Models | Jul 18, 2024 | GSM8KMath | CodeCode Available | 1 | 5 |
| Learning From Mistakes Makes LLM Better Reasoner | Oct 31, 2023 | GSM8KMath | CodeCode Available | 1 | 5 |
| OVM, Outcome-supervised Value Models for Planning in Mathematical Reasoning | Nov 16, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 1 | 5 |
| Automatic Model Selection with Large Language Models for Reasoning | May 23, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 1 | 5 |
| Large (Vision) Language Models are Unsupervised In-Context Learners | Apr 3, 2025 | GSM8KIn-Context Learning | CodeCode Available | 1 | 5 |
| CommVQ: Commutative Vector Quantization for KV Cache Compression | Jun 23, 2025 | GPUGSM8K | CodeCode Available | 1 | 5 |
| Lexico: Extreme KV Cache Compression via Sparse Coding over Universal Dictionaries | Dec 12, 2024 | 4kGSM8K | CodeCode Available | 1 | 5 |
| Over-Reasoning and Redundant Calculation of Large Language Models | Jan 21, 2024 | GSM8KMath | CodeCode Available | 1 | 5 |
| Coevolving with the Other You: Fine-Tuning LLM with Sequential Cooperative Multi-Agent Reinforcement Learning | Oct 8, 2024 | GSM8KMulti-agent Reinforcement Learning | CodeCode Available | 1 | 5 |
| Large Language Models Are Latent Variable Models: Explaining and Finding Good Demonstrations for In-Context Learning | Jan 27, 2023 | Few-Shot LearningGSM8K | CodeCode Available | 1 | 5 |
| Neural-Symbolic Collaborative Distillation: Advancing Small Language Models for Complex Reasoning Tasks | Sep 20, 2024 | ARCGSM8K | CodeCode Available | 1 | 5 |
| Neuro-Symbolic Integration Brings Causal and Reliable Reasoning Proofs | Nov 16, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 1 | 5 |
| Language Models as Science Tutors | Feb 16, 2024 | GSM8KMath | CodeCode Available | 1 | 5 |
| Large Language Models as Optimizers | Sep 7, 2023 | GSM8K | CodeCode Available | 1 | 5 |
| MR-GSM8K: A Meta-Reasoning Benchmark for Large Language Model Evaluation | Dec 28, 2023 | GSM8KLanguage Model Evaluation | CodeCode Available | 1 | 5 |
| Achieving >97% on GSM8K: Deeply Understanding the Problems Makes LLMs Better Solvers for Math Word Problems | Apr 23, 2024 | Arithmetic ReasoningGSM8K | CodeCode Available | 1 | 5 |
| Large Language Models are Contrastive Reasoners | Mar 13, 2024 | GSM8K | CodeCode Available | 1 | 5 |
| AskIt: Unified Programming Interface for Programming with Large Language Models | Aug 29, 2023 | Code GenerationFew-Shot Learning | CodeCode Available | 1 | 5 |