| OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset | Feb 15, 2024 | Arithmetic ReasoningGSM8K | CodeCode Available | 4 |
| The Unreasonable Effectiveness of Eccentric Automatic Prompts | Feb 9, 2024 | Arithmetic ReasoningGSM8K | —Unverified | 0 |
| Exploring Group and Symmetry Principles in Large Language Models | Feb 9, 2024 | Arithmetic ReasoningNegation | —Unverified | 0 |
| DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models | Feb 5, 2024 | Arithmetic ReasoningMath | CodeCode Available | 9 |
| Evaluating Gender Bias in Large Language Models via Chain-of-Thought Prompting | Jan 28, 2024 | Arithmetic ReasoningFact Checking | —Unverified | 0 |
| Evaluating LLMs' Mathematical and Coding Competency through Ontology-guided Interventions | Jan 17, 2024 | Arithmetic ReasoningCode Generation | CodeCode Available | 1 |
| Large Language Models are Null-Shot Learners | Jan 16, 2024 | Arithmetic ReasoningBenchmarking | —Unverified | 0 |
| Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks | Jan 5, 2024 | Arithmetic ReasoningCode Generation | CodeCode Available | 2 |
| LLM Augmented LLMs: Expanding Capabilities through Composition | Jan 4, 2024 | Arithmetic ReasoningCode Generation | CodeCode Available | 0 |
| Turning Dust into Gold: Distilling Complex Reasoning Capabilities from LLMs by Leveraging Negative Data | Dec 20, 2023 | Arithmetic Reasoning | CodeCode Available | 1 |
| Gemini: A Family of Highly Capable Multimodal Models | Dec 19, 2023 | 1 Image, 2*2 StitchingArithmetic Reasoning | CodeCode Available | 1 |
| TinyGSM: achieving >80% on GSM8k with small language models | Dec 14, 2023 | Arithmetic ReasoningGSM8K | —Unverified | 0 |
| Fewer is More: Boosting LLM Reasoning with Reinforced Context Pruning | Dec 14, 2023 | Arithmetic ReasoningFew-Shot Learning | —Unverified | 0 |
| Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations | Dec 14, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 1 |
| Frugal LMs Trained to Invoke Symbolic Solvers Achieve Parameter-Efficient Arithmetic Reasoning | Dec 9, 2023 | Arithmetic ReasoningMathematical Reasoning | CodeCode Available | 0 |
| Prompt Optimization via Adversarial In-Context Learning | Dec 5, 2023 | Arithmetic ReasoningData-to-Text Generation | CodeCode Available | 1 |
| ChatGPT as a Math Questioner? Evaluating ChatGPT on Generating Pre-university Math Questions | Dec 4, 2023 | Arithmetic ReasoningMath | CodeCode Available | 0 |
| Generative Parameter-Efficient Fine-Tuning | Dec 1, 2023 | Arithmetic ReasoningFine-Grained Image Classification | CodeCode Available | 1 |
| Orca 2: Teaching Small Language Models How to Reason | Nov 18, 2023 | Arithmetic ReasoningCommon Sense Reasoning | —Unverified | 0 |
| OVM, Outcome-supervised Value Models for Planning in Mathematical Reasoning | Nov 16, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 1 |
| Neuro-Symbolic Integration Brings Causal and Reliable Reasoning Proofs | Nov 16, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 1 |
| The ART of LLM Refinement: Ask, Refine, and Trust | Nov 14, 2023 | Arithmetic ReasoningGSM8K | —Unverified | 0 |
| Prompt Sketching for Large Language Models | Nov 8, 2023 | Arithmetic ReasoningBenchmarking | —Unverified | 0 |
| Llemma: An Open Language Model For Mathematics | Oct 16, 2023 | Arithmetic ReasoningAutomated Theorem Proving | CodeCode Available | 3 |
| Empirical Study of Zero-Shot NER with ChatGPT | Oct 16, 2023 | Arithmetic Reasoningnamed-entity-recognition | CodeCode Available | 1 |