| DCR: Quantifying Data Contamination in LLMs Evaluation | Jul 15, 2025 | Arithmetic ReasoningBenchmarking | CodeCode Available | 0 |
| DS@GT at CheckThat! 2025: Evaluating Context and Tokenization Strategies for Numerical Fact Verification | Jul 8, 2025 | ARCArithmetic Reasoning | CodeCode Available | 0 |
| FinLMM-R1: Enhancing Financial Reasoning in LMM through Scalable Data and Reward Design | Jun 16, 2025 | Answer GenerationArithmetic Reasoning | —Unverified | 0 |
| Mathematical Reasoning for Unmanned Aerial Vehicles: A RAG-Based Approach for Complex Arithmetic Reasoning | Jun 5, 2025 | Arithmetic ReasoningMath | CodeCode Available | 0 |
| Learning-at-Criticality in Large Language Models for Quantum Field Theory and Beyond | Jun 4, 2025 | Arithmetic ReasoningReinforcement Learning (RL) | —Unverified | 0 |
| DiaBlo: Diagonal Blocks Are Sufficient For Finetuning | Jun 3, 2025 | Arithmetic ReasoningCode Generation | CodeCode Available | 0 |
| VisualSphinx: Large-Scale Synthetic Vision Logic Puzzles for RL | May 29, 2025 | Arithmetic ReasoningImage Generation | —Unverified | 0 |
| Joint Flashback Adaptation for Forgetting-Resistant Instruction Tuning | May 21, 2025 | Arithmetic ReasoningInstruction Following | —Unverified | 0 |
| Tokenization Constraints in LLMs: A Study of Symbolic and Arithmetic Reasoning Limits | May 20, 2025 | Arithmetic Reasoning | —Unverified | 0 |
| HALO: Hierarchical Autonomous Logic-Oriented Orchestration for Multi-Agent LLM Systems | May 17, 2025 | Arithmetic ReasoningCode Generation | CodeCode Available | 1 |
| OMAC: A Broad Optimization Framework for LLM-Based Multi-Agent Collaboration | May 17, 2025 | Arithmetic ReasoningCode Generation | CodeCode Available | 0 |
| Fact-Consistency Evaluation of Text-to-SQL Generation for Business Intelligence Using Exaone 3.5 | Apr 30, 2025 | Arithmetic ReasoningText to SQL | —Unverified | 0 |
| CAPO: Cost-Aware Prompt Optimization | Apr 22, 2025 | Arithmetic ReasoningAutoML | CodeCode Available | 2 |
| ThoughtProbe: Classifier-Guided Thought Space Exploration Leveraging LLM Intrinsic Reasoning | Apr 9, 2025 | Arithmetic Reasoningvalid | —Unverified | 0 |
| Is the Reversal Curse a Binding Problem? Uncovering Limitations of Transformers from a Basic Generalization Failure | Apr 2, 2025 | Arithmetic ReasoningData Augmentation | CodeCode Available | 1 |
| Your Language Model May Think Too Rigidly: Achieving Reasoning Consistency with Symmetry-Enhanced Training | Feb 25, 2025 | Arithmetic ReasoningData Augmentation | —Unverified | 0 |
| The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve? | Feb 24, 2025 | Arithmetic ReasoningCommon Sense Reasoning | —Unverified | 0 |
| Fed-SB: A Silver Bullet for Extreme Communication Efficiency and Performance in (Private) Federated LoRA Fine-Tuning | Feb 21, 2025 | Arithmetic Reasoning | CodeCode Available | 1 |
| Inference-Time Computations for LLM Reasoning and Planning: A Benchmark and Insights | Feb 18, 2025 | Arithmetic ReasoningCommon Sense Reasoning | —Unverified | 0 |
| Why Vision Language Models Struggle with Visual Arithmetic? Towards Enhanced Chart and Geometry Understanding | Feb 17, 2025 | Arithmetic ReasoningChart Understanding | —Unverified | 0 |
| On Representational Dissociation of Language and Arithmetic in Large Language Models | Feb 17, 2025 | Arithmetic Reasoning | —Unverified | 0 |
| Can LLMs Maintain Fundamental Abilities under KV Cache Compression? | Feb 4, 2025 | Arithmetic ReasoningCode Generation | —Unverified | 0 |
| CLoQ: Enhancing Fine-Tuning of Quantized LLMs via Calibrated LoRA Initialization | Jan 30, 2025 | Arithmetic ReasoningText Generation | —Unverified | 0 |
| SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training | Jan 28, 2025 | Arithmetic ReasoningMemorization | —Unverified | 0 |
| Rethinking Addressing in Language Models via Contexualized Equivariant Positional Encoding | Jan 1, 2025 | Arithmetic ReasoningLanguage Modeling | CodeCode Available | 1 |
| DoTA: Weight-Decomposed Tensor Adaptation for Large Language Models | Dec 30, 2024 | Arithmetic ReasoningQuantization | —Unverified | 0 |
| Towards Intrinsic Self-Correction Enhancement in Monte Carlo Tree Search Boosted Reasoning via Iterative Preference Learning | Dec 23, 2024 | Arithmetic ReasoningGSM8K | —Unverified | 0 |
| Why We Build Local Large Language Models: An Observational Analysis from 35 Japanese and Multilingual LLMs | Dec 19, 2024 | Arithmetic ReasoningCode Generation | —Unverified | 0 |
| Hint Marginalization for Improved Reasoning in Large Language Models | Dec 17, 2024 | Arithmetic Reasoning | —Unverified | 0 |
| GaLore+: Boosting Low-Rank Adaptation for LLMs with Cross-Head Projection | Dec 15, 2024 | Arithmetic ReasoningText Generation | —Unverified | 0 |
| S^2FT: Efficient, Scalable and Generalizable LLM Fine-tuning by Structured Sparsity | Dec 9, 2024 | Arithmetic Reasoning | —Unverified | 0 |
| Think-to-Talk or Talk-to-Think? When LLMs Come Up with an Answer in Multi-Step Arithmetic Reasoning | Dec 2, 2024 | Arithmetic Reasoning | —Unverified | 0 |
| PERFT: Parameter-Efficient Routed Fine-Tuning for Mixture-of-Expert Model | Nov 12, 2024 | Arithmetic ReasoningMixture-of-Experts | —Unverified | 0 |
| Seq-VCR: Preventing Collapse in Intermediate Transformer Representations for Enhanced Reasoning | Nov 4, 2024 | Arithmetic ReasoningDecoder | CodeCode Available | 0 |
| Arithmetic Without Algorithms: Language Models Solve Math With a Bag of Heuristics | Oct 28, 2024 | Arithmetic ReasoningMath | CodeCode Available | 1 |
| FedEx-LoRA: Exact Aggregation for Federated and Efficient Fine-Tuning of Foundation Models | Oct 12, 2024 | Arithmetic ReasoningFederated Learning | CodeCode Available | 1 |
| Language Imbalance Driven Rewarding for Multilingual Self-improving | Oct 11, 2024 | Arithmetic ReasoningInstruction Following | CodeCode Available | 1 |
| Teaching-Inspired Integrated Prompting Framework: A Novel Approach for Enhancing Reasoning in Large Language Models | Oct 10, 2024 | Arithmetic ReasoningMath | CodeCode Available | 0 |
| Think Beyond Size: Adaptive Prompting for More Effective Reasoning | Oct 10, 2024 | Arithmetic ReasoningComputational Efficiency | —Unverified | 0 |
| Unlocking Structured Thinking in Language Models with Cognitive Prompting | Oct 3, 2024 | Arithmetic ReasoningGSM8K | —Unverified | 0 |
| OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data | Oct 2, 2024 | Arithmetic ReasoningLarge Language Model | CodeCode Available | 4 |
| Small Language Models are Equation Reasoners | Sep 19, 2024 | Arithmetic ReasoningKnowledge Distillation | —Unverified | 0 |
| 3-in-1: 2D Rotary Adaptation for Efficient Finetuning, Efficient Batching and Composability | Aug 28, 2024 | Arithmetic ReasoningGPU | CodeCode Available | 0 |
| Relating the Seemingly Unrelated: Principled Understanding of Generalization for Generative Models in Arithmetic Reasoning Tasks | Jul 25, 2024 | Arithmetic Reasoning | —Unverified | 0 |
| Leveraging LLM Reasoning Enhances Personalized Recommender Systems | Jul 22, 2024 | Arithmetic ReasoningRecommendation Systems | —Unverified | 0 |
| Toward Adaptive Reasoning in Large Language Models with Thought Rollback | Jul 21, 2024 | Arithmetic ReasoningMath | CodeCode Available | 1 |
| Fine-Tuning and Prompt Optimization: Two Great Steps that Work Better Together | Jul 15, 2024 | Arithmetic ReasoningLanguage Modeling | —Unverified | 0 |
| Qwen2 Technical Report | Jul 15, 2024 | Arithmetic ReasoningGSM8K | CodeCode Available | 13 |
| Self-training Language Models for Arithmetic Reasoning | Jul 11, 2024 | Arithmetic Reasoning | CodeCode Available | 0 |
| SBoRA: Low-Rank Adaptation with Regional Weight Updates | Jul 7, 2024 | Arithmetic Reasoningparameter-efficient fine-tuning | CodeCode Available | 0 |