| HALO: Hierarchical Autonomous Logic-Oriented Orchestration for Multi-Agent LLM Systems | May 17, 2025 | Arithmetic ReasoningCode Generation | CodeCode Available | 1 |
| Fact-Consistency Evaluation of Text-to-SQL Generation for Business Intelligence Using Exaone 3.5 | Apr 30, 2025 | Arithmetic ReasoningText to SQL | —Unverified | 0 |
| CAPO: Cost-Aware Prompt Optimization | Apr 22, 2025 | Arithmetic ReasoningAutoML | CodeCode Available | 2 |
| ThoughtProbe: Classifier-Guided Thought Space Exploration Leveraging LLM Intrinsic Reasoning | Apr 9, 2025 | Arithmetic Reasoningvalid | —Unverified | 0 |
| Is the Reversal Curse a Binding Problem? Uncovering Limitations of Transformers from a Basic Generalization Failure | Apr 2, 2025 | Arithmetic ReasoningData Augmentation | CodeCode Available | 1 |
| Your Language Model May Think Too Rigidly: Achieving Reasoning Consistency with Symmetry-Enhanced Training | Feb 25, 2025 | Arithmetic ReasoningData Augmentation | —Unverified | 0 |
| The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve? | Feb 24, 2025 | Arithmetic ReasoningCommon Sense Reasoning | —Unverified | 0 |
| Fed-SB: A Silver Bullet for Extreme Communication Efficiency and Performance in (Private) Federated LoRA Fine-Tuning | Feb 21, 2025 | Arithmetic Reasoning | CodeCode Available | 1 |
| Inference-Time Computations for LLM Reasoning and Planning: A Benchmark and Insights | Feb 18, 2025 | Arithmetic ReasoningCommon Sense Reasoning | —Unverified | 0 |
| On Representational Dissociation of Language and Arithmetic in Large Language Models | Feb 17, 2025 | Arithmetic Reasoning | —Unverified | 0 |