| Cramer-Rao bound and absolute sensitivity in chemical reaction networks | Jan 13, 2024 | MathSensitivity | —Unverified | 0 |
| CHAMP: A Competition-level Dataset for Fine-Grained Analyses of LLMs' Mathematical Reasoning Capabilities | Jan 13, 2024 | MathMathematical Reasoning | —Unverified | 0 |
| The Benefits of a Concise Chain of Thought on Problem-Solving in Large Language Models | Jan 11, 2024 | MathMultiple-choice | CodeCode Available | 1 |
| RoSA: Accurate Parameter-Efficient Fine-Tuning via Robust Adaptation | Jan 9, 2024 | GPUMath | CodeCode Available | 3 |
| Language Models Encode the Value of Numbers Linearly | Jan 8, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Using Large Language Models to Assess Tutors' Performance in Reacting to Students Making Math Errors | Jan 6, 2024 | Math | —Unverified | 0 |
| Graph2Tac: Online Representation Learning of Formal Math Concepts | Jan 5, 2024 | AI AgentAutomated Theorem Proving | —Unverified | 0 |
| Mastery Guided Non-parametric Clustering to Scale-up Strategy Prediction | Jan 4, 2024 | ClusteringFairness | —Unverified | 0 |
| LLaMA Pro: Progressive LLaMA with Block Expansion | Jan 4, 2024 | Instruction FollowingMath | CodeCode Available | 4 |
| MR-GSM8K: A Meta-Reasoning Benchmark for Large Language Model Evaluation | Dec 28, 2023 | GSM8KLanguage Model Evaluation | CodeCode Available | 1 |