| GReaTer: Gradients over Reasoning Makes Smaller Language Models Strong Prompt Optimizers | Dec 12, 2024 | GSM8KPrompt Engineering | CodeCode Available | 1 |
| Lexico: Extreme KV Cache Compression via Sparse Coding over Universal Dictionaries | Dec 12, 2024 | 4kGSM8K | CodeCode Available | 1 |
| Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM's Reasoning Capability | Nov 29, 2024 | GSM8KMath | CodeCode Available | 1 |
| What Do Learning Dynamics Reveal About Generalization in LLM Reasoning? | Nov 12, 2024 | GSM8KMath | CodeCode Available | 1 |
| UTMath: Math Evaluation with Unit Test via Reasoning-to-Coding Thoughts | Nov 11, 2024 | Code GenerationGSM8K | CodeCode Available | 1 |
| LoRA Done RITE: Robust Invariant Transformation Equilibration for LoRA Optimization | Oct 27, 2024 | GSM8KHellaSwag | CodeCode Available | 1 |
| Math Neurosurgery: Isolating Language Models' Math Reasoning Abilities Using Only Forward Passes | Oct 22, 2024 | GSM8KLanguage Modeling | CodeCode Available | 1 |
| Coevolving with the Other You: Fine-Tuning LLM with Sequential Cooperative Multi-Agent Reinforcement Learning | Oct 8, 2024 | GSM8KMulti-agent Reinforcement Learning | CodeCode Available | 1 |
| GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models | Oct 7, 2024 | GSM8KLogical Reasoning | CodeCode Available | 1 |
| Neural-Symbolic Collaborative Distillation: Advancing Small Language Models for Complex Reasoning Tasks | Sep 20, 2024 | ARCGSM8K | CodeCode Available | 1 |
| Improving LLM Reasoning with Multi-Agent Tree-of-Thought Validator Agent | Sep 17, 2024 | GSM8KQuestion Answering | CodeCode Available | 1 |
| SORSA: Singular Values and Orthonormal Regularized Singular Vectors Adaptation of Large Language Models | Aug 21, 2024 | 8kGSM8K | CodeCode Available | 1 |
| Mathfish: Evaluating Language Model Math Reasoning via Grounding in Educational Curricula | Aug 8, 2024 | GSM8KLanguage Modeling | CodeCode Available | 1 |
| Learning Goal-Conditioned Representations for Language Reward Models | Jul 18, 2024 | GSM8KMath | CodeCode Available | 1 |
| DotaMath: Decomposition of Thought with Code Assistance and Self-correction for Mathematical Reasoning | Jul 4, 2024 | AvgGSM8K | CodeCode Available | 1 |
| Step-Controlled DPO: Leveraging Stepwise Error for Enhanced Mathematical Reasoning | Jun 30, 2024 | GSM8KMath | CodeCode Available | 1 |
| LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Feedback | Jun 20, 2024 | Binary ClassificationGSM8K | CodeCode Available | 1 |
| Hierarchical Prompting Taxonomy: A Universal Evaluation Framework for Large Language Models Aligned with Human Cognitive Principles | Jun 18, 2024 | Arithmetic ReasoningCode Generation | CodeCode Available | 1 |
| DELLA-Merging: Reducing Interference in Model Merging through Magnitude-Based Sampling | Jun 17, 2024 | GSM8KMath | CodeCode Available | 1 |
| ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification | May 23, 2024 | GPUGSM8K | CodeCode Available | 1 |
| Unchosen Experts Can Contribute Too: Unleashing MoE Models' Power by Self-Contrast | May 23, 2024 | Computational EfficiencyGSM8K | CodeCode Available | 1 |
| Multiple-Choice Questions are Efficient and Robust LLM Evaluators | May 20, 2024 | GSM8KHumanEval | CodeCode Available | 1 |
| Markovian Transformers for Informative Language Modeling | Apr 29, 2024 | GSM8KInformativeness | CodeCode Available | 1 |
| Achieving >97% on GSM8K: Deeply Understanding the Problems Makes LLMs Better Solvers for Math Word Problems | Apr 23, 2024 | Arithmetic ReasoningGSM8K | CodeCode Available | 1 |
| Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing | Apr 18, 2024 | Arithmetic ReasoningGSM8K | CodeCode Available | 1 |