| Calc-X and Calcformers: Empowering Arithmetical Chain-of-Thought through Interaction with Symbolic Systems | May 24, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 0 | 5 |
| Inference-Time Decontamination: Reusing Leaked Benchmarks for Large Language Model Evaluation | Jun 20, 2024 | GSM8KLanguage Model Evaluation | CodeCode Available | 0 | 5 |
| DeLTa: A Decoding Strategy based on Logit Trajectory Prediction Improves Factuality and Reasoning Ability | Mar 4, 2025 | GSM8KLogical Reasoning | CodeCode Available | 0 | 5 |
| Inference Scaling vs Reasoning: An Empirical Analysis of Compute-Optimal LLM Problem-Solving | Dec 20, 2024 | Computational EfficiencyGSM8K | CodeCode Available | 0 | 5 |
| In-Context Principle Learning from Mistakes | Feb 8, 2024 | GSM8KIn-Context Learning | CodeCode Available | 0 | 5 |
| A mixed policy to improve performance of language models on math problems | Jul 17, 2023 | GSM8KMath | CodeCode Available | 0 | 5 |
| How to Leverage Demonstration Data in Alignment for Large Language Model? A Self-Imitation Learning Perspective | Oct 14, 2024 | Density Ratio EstimationGSM8K | CodeCode Available | 0 | 5 |
| DAC: A Dynamic Attention-aware Approach for Task-Agnostic Prompt Compression | Jul 16, 2025 | GSM8K | CodeCode Available | 0 | 5 |
| NLoRA: Nyström-Initiated Low-Rank Adaptation for Large Language Models | Feb 20, 2025 | GSM8KNatural Language Understanding | CodeCode Available | 0 | 5 |
| Mathematical Reasoning in Large Language Models: Assessing Logical and Arithmetic Errors across Wide Numerical Ranges | Feb 12, 2025 | GSM8KMath | CodeCode Available | 0 | 5 |