| DCR: Quantifying Data Contamination in LLMs Evaluation | Jul 15, 2025 | Arithmetic ReasoningBenchmarking | CodeCode Available | 0 |
| DS@GT at CheckThat! 2025: Evaluating Context and Tokenization Strategies for Numerical Fact Verification | Jul 8, 2025 | ARCArithmetic Reasoning | CodeCode Available | 0 |
| FinLMM-R1: Enhancing Financial Reasoning in LMM through Scalable Data and Reward Design | Jun 16, 2025 | Answer GenerationArithmetic Reasoning | —Unverified | 0 |
| Mathematical Reasoning for Unmanned Aerial Vehicles: A RAG-Based Approach for Complex Arithmetic Reasoning | Jun 5, 2025 | Arithmetic ReasoningMath | CodeCode Available | 0 |
| Learning-at-Criticality in Large Language Models for Quantum Field Theory and Beyond | Jun 4, 2025 | Arithmetic ReasoningReinforcement Learning (RL) | —Unverified | 0 |
| DiaBlo: Diagonal Blocks Are Sufficient For Finetuning | Jun 3, 2025 | Arithmetic ReasoningCode Generation | CodeCode Available | 0 |
| VisualSphinx: Large-Scale Synthetic Vision Logic Puzzles for RL | May 29, 2025 | Arithmetic ReasoningImage Generation | —Unverified | 0 |
| Joint Flashback Adaptation for Forgetting-Resistant Instruction Tuning | May 21, 2025 | Arithmetic ReasoningInstruction Following | —Unverified | 0 |
| Tokenization Constraints in LLMs: A Study of Symbolic and Arithmetic Reasoning Limits | May 20, 2025 | Arithmetic Reasoning | —Unverified | 0 |
| HALO: Hierarchical Autonomous Logic-Oriented Orchestration for Multi-Agent LLM Systems | May 17, 2025 | Arithmetic ReasoningCode Generation | CodeCode Available | 1 |
| OMAC: A Broad Optimization Framework for LLM-Based Multi-Agent Collaboration | May 17, 2025 | Arithmetic ReasoningCode Generation | CodeCode Available | 0 |
| Fact-Consistency Evaluation of Text-to-SQL Generation for Business Intelligence Using Exaone 3.5 | Apr 30, 2025 | Arithmetic ReasoningText to SQL | —Unverified | 0 |
| CAPO: Cost-Aware Prompt Optimization | Apr 22, 2025 | Arithmetic ReasoningAutoML | CodeCode Available | 2 |
| ThoughtProbe: Classifier-Guided Thought Space Exploration Leveraging LLM Intrinsic Reasoning | Apr 9, 2025 | Arithmetic Reasoningvalid | —Unverified | 0 |
| Is the Reversal Curse a Binding Problem? Uncovering Limitations of Transformers from a Basic Generalization Failure | Apr 2, 2025 | Arithmetic ReasoningData Augmentation | CodeCode Available | 1 |
| Your Language Model May Think Too Rigidly: Achieving Reasoning Consistency with Symmetry-Enhanced Training | Feb 25, 2025 | Arithmetic ReasoningData Augmentation | —Unverified | 0 |
| The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve? | Feb 24, 2025 | Arithmetic ReasoningCommon Sense Reasoning | —Unverified | 0 |
| Fed-SB: A Silver Bullet for Extreme Communication Efficiency and Performance in (Private) Federated LoRA Fine-Tuning | Feb 21, 2025 | Arithmetic Reasoning | CodeCode Available | 1 |
| Inference-Time Computations for LLM Reasoning and Planning: A Benchmark and Insights | Feb 18, 2025 | Arithmetic ReasoningCommon Sense Reasoning | —Unverified | 0 |
| On Representational Dissociation of Language and Arithmetic in Large Language Models | Feb 17, 2025 | Arithmetic Reasoning | —Unverified | 0 |
| Why Vision Language Models Struggle with Visual Arithmetic? Towards Enhanced Chart and Geometry Understanding | Feb 17, 2025 | Arithmetic ReasoningChart Understanding | —Unverified | 0 |
| Can LLMs Maintain Fundamental Abilities under KV Cache Compression? | Feb 4, 2025 | Arithmetic ReasoningCode Generation | —Unverified | 0 |
| CLoQ: Enhancing Fine-Tuning of Quantized LLMs via Calibrated LoRA Initialization | Jan 30, 2025 | Arithmetic ReasoningText Generation | —Unverified | 0 |
| SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training | Jan 28, 2025 | Arithmetic ReasoningMemorization | —Unverified | 0 |
| Rethinking Addressing in Language Models via Contexualized Equivariant Positional Encoding | Jan 1, 2025 | Arithmetic ReasoningLanguage Modeling | CodeCode Available | 1 |