| Cramer-Rao bound and absolute sensitivity in chemical reaction networks | Jan 13, 2024 | MathSensitivity | —Unverified | 0 |
| CHAMP: A Competition-level Dataset for Fine-Grained Analyses of LLMs' Mathematical Reasoning Capabilities | Jan 13, 2024 | MathMathematical Reasoning | —Unverified | 0 |
| The Benefits of a Concise Chain of Thought on Problem-Solving in Large Language Models | Jan 11, 2024 | MathMultiple-choice | CodeCode Available | 1 |
| RoSA: Accurate Parameter-Efficient Fine-Tuning via Robust Adaptation | Jan 9, 2024 | GPUMath | CodeCode Available | 3 |
| Language Models Encode the Value of Numbers Linearly | Jan 8, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Using Large Language Models to Assess Tutors' Performance in Reacting to Students Making Math Errors | Jan 6, 2024 | Math | —Unverified | 0 |
| Graph2Tac: Online Representation Learning of Formal Math Concepts | Jan 5, 2024 | AI AgentAutomated Theorem Proving | —Unverified | 0 |
| Mastery Guided Non-parametric Clustering to Scale-up Strategy Prediction | Jan 4, 2024 | ClusteringFairness | —Unverified | 0 |
| LLaMA Pro: Progressive LLaMA with Block Expansion | Jan 4, 2024 | Instruction FollowingMath | CodeCode Available | 4 |
| MR-GSM8K: A Meta-Reasoning Benchmark for Large Language Model Evaluation | Dec 28, 2023 | GSM8KLanguage Model Evaluation | CodeCode Available | 1 |
| MathPile: A Billion-Token-Scale Pretraining Corpus for Math | Dec 28, 2023 | Language IdentificationMath | CodeCode Available | 2 |
| Assessing the Impact of Prompting Methods on ChatGPT's Mathematical Capabilities | Dec 22, 2023 | ChatbotGSM8K | —Unverified | 0 |
| From Good to Great: Improving Math Reasoning with Tool-Augmented Interleaf Prompting | Dec 18, 2023 | DiversityGSM8K | —Unverified | 0 |
| An In-depth Look at Gemini's Language Abilities | Dec 18, 2023 | Instruction FollowingMath | CodeCode Available | 1 |
| Modeling Complex Mathematical Reasoning via Large Language Model based MathAgent | Dec 14, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations | Dec 14, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 1 |
| TinyGSM: achieving >80% on GSM8k with small language models | Dec 14, 2023 | Arithmetic ReasoningGSM8K | —Unverified | 0 |
| Fewer is More: Boosting LLM Reasoning with Reinforced Context Pruning | Dec 14, 2023 | Arithmetic ReasoningFew-Shot Learning | —Unverified | 0 |
| Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models | Dec 11, 2023 | DiversityMath | —Unverified | 0 |
| Get an A in Math: Progressive Rectification Prompting | Dec 11, 2023 | Math | CodeCode Available | 1 |
| LaRS: Latent Reasoning Skills for Chain-of-Thought Reasoning | Dec 7, 2023 | In-Context LearningMath | —Unverified | 0 |
| Is Bigger and Deeper Always Better? Probing LLaMA Across Scales and Layers | Dec 7, 2023 | MathMultiple-choice | CodeCode Available | 1 |
| ChatGPT as a Math Questioner? Evaluating ChatGPT on Generating Pre-university Math Questions | Dec 4, 2023 | Arithmetic ReasoningMath | CodeCode Available | 0 |
| Eliciting Latent Knowledge from Quirky Language Models | Dec 2, 2023 | Anomaly DetectionMath | CodeCode Available | 1 |
| YUAN 2.0: A Large Language Model with Localized Filtering-based Attention | Nov 27, 2023 | Code GenerationLanguage Modeling | CodeCode Available | 2 |