| Automatic Prompt Augmentation and Selection with Chain-of-Thought from Labeled Data | Feb 24, 2023 | Arithmetic ReasoningLanguage Modelling | CodeCode Available | 1 | 5 |
| Large Language Models are Better Reasoners with Self-Verification | Dec 19, 2022 | Arithmetic ReasoningCommon Sense Reasoning | CodeCode Available | 1 | 5 |
| Fed-SB: A Silver Bullet for Extreme Communication Efficiency and Performance in (Private) Federated LoRA Fine-Tuning | Feb 21, 2025 | Arithmetic Reasoning | CodeCode Available | 1 | 5 |
| Automatic Model Selection with Large Language Models for Reasoning | May 23, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 1 | 5 |
| Is the Reversal Curse a Binding Problem? Uncovering Limitations of Transformers from a Basic Generalization Failure | Apr 2, 2025 | Arithmetic ReasoningData Augmentation | CodeCode Available | 1 | 5 |
| Large Language Models Can Be Easily Distracted by Irrelevant Context | Jan 31, 2023 | Arithmetic ReasoningLanguage Modeling | CodeCode Available | 1 | 5 |
| FedEx-LoRA: Exact Aggregation for Federated and Efficient Fine-Tuning of Foundation Models | Oct 12, 2024 | Arithmetic ReasoningFederated Learning | CodeCode Available | 1 | 5 |
| Hierarchical Prompting Taxonomy: A Universal Evaluation Framework for Large Language Models Aligned with Human Cognitive Principles | Jun 18, 2024 | Arithmetic ReasoningCode Generation | CodeCode Available | 1 | 5 |
| Achieving >97% on GSM8K: Deeply Understanding the Problems Makes LLMs Better Solvers for Math Word Problems | Apr 23, 2024 | Arithmetic ReasoningGSM8K | CodeCode Available | 1 | 5 |
| HALO: Hierarchical Autonomous Logic-Oriented Orchestration for Multi-Agent LLM Systems | May 17, 2025 | Arithmetic ReasoningCode Generation | CodeCode Available | 1 | 5 |