| Cramer-Rao bound and absolute sensitivity in chemical reaction networks | Jan 13, 2024 | MathSensitivity | —Unverified | 0 |
| CHAMP: A Competition-level Dataset for Fine-Grained Analyses of LLMs' Mathematical Reasoning Capabilities | Jan 13, 2024 | MathMathematical Reasoning | —Unverified | 0 |
| The Benefits of a Concise Chain of Thought on Problem-Solving in Large Language Models | Jan 11, 2024 | MathMultiple-choice | CodeCode Available | 1 |
| RoSA: Accurate Parameter-Efficient Fine-Tuning via Robust Adaptation | Jan 9, 2024 | GPUMath | CodeCode Available | 3 |
| Language Models Encode the Value of Numbers Linearly | Jan 8, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Using Large Language Models to Assess Tutors' Performance in Reacting to Students Making Math Errors | Jan 6, 2024 | Math | —Unverified | 0 |
| Graph2Tac: Online Representation Learning of Formal Math Concepts | Jan 5, 2024 | AI AgentAutomated Theorem Proving | —Unverified | 0 |
| Mastery Guided Non-parametric Clustering to Scale-up Strategy Prediction | Jan 4, 2024 | ClusteringFairness | —Unverified | 0 |
| LLaMA Pro: Progressive LLaMA with Block Expansion | Jan 4, 2024 | Instruction FollowingMath | CodeCode Available | 4 |
| MR-GSM8K: A Meta-Reasoning Benchmark for Large Language Model Evaluation | Dec 28, 2023 | GSM8KLanguage Model Evaluation | CodeCode Available | 1 |
| MathPile: A Billion-Token-Scale Pretraining Corpus for Math | Dec 28, 2023 | Language IdentificationMath | CodeCode Available | 2 |
| Assessing the Impact of Prompting Methods on ChatGPT's Mathematical Capabilities | Dec 22, 2023 | ChatbotGSM8K | —Unverified | 0 |
| From Good to Great: Improving Math Reasoning with Tool-Augmented Interleaf Prompting | Dec 18, 2023 | DiversityGSM8K | —Unverified | 0 |
| An In-depth Look at Gemini's Language Abilities | Dec 18, 2023 | Instruction FollowingMath | CodeCode Available | 1 |
| Modeling Complex Mathematical Reasoning via Large Language Model based MathAgent | Dec 14, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations | Dec 14, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 1 |
| TinyGSM: achieving >80% on GSM8k with small language models | Dec 14, 2023 | Arithmetic ReasoningGSM8K | —Unverified | 0 |
| Fewer is More: Boosting LLM Reasoning with Reinforced Context Pruning | Dec 14, 2023 | Arithmetic ReasoningFew-Shot Learning | —Unverified | 0 |
| Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models | Dec 11, 2023 | DiversityMath | —Unverified | 0 |
| Get an A in Math: Progressive Rectification Prompting | Dec 11, 2023 | Math | CodeCode Available | 1 |
| LaRS: Latent Reasoning Skills for Chain-of-Thought Reasoning | Dec 7, 2023 | In-Context LearningMath | —Unverified | 0 |
| Is Bigger and Deeper Always Better? Probing LLaMA Across Scales and Layers | Dec 7, 2023 | MathMultiple-choice | CodeCode Available | 1 |
| ChatGPT as a Math Questioner? Evaluating ChatGPT on Generating Pre-university Math Questions | Dec 4, 2023 | Arithmetic ReasoningMath | CodeCode Available | 0 |
| Eliciting Latent Knowledge from Quirky Language Models | Dec 2, 2023 | Anomaly DetectionMath | CodeCode Available | 1 |
| YUAN 2.0: A Large Language Model with Localized Filtering-based Attention | Nov 27, 2023 | Code GenerationLanguage Modeling | CodeCode Available | 2 |
| REDS: Resource-Efficient Deep Subnetworks for Dynamic Resource Constraints | Nov 22, 2023 | Computational EfficiencyMath | —Unverified | 0 |
| MathGloss: Building mathematical glossaries from text | Nov 21, 2023 | Math | CodeCode Available | 1 |
| Meta Prompting for AI Systems | Nov 20, 2023 | Data InteractionGSM8K | CodeCode Available | 2 |
| System 2 Attention (is something you might need too) | Nov 20, 2023 | Math | CodeCode Available | 2 |
| DocMath-Eval: Evaluating Math Reasoning Capabilities of LLMs in Understanding Long and Specialized Documents | Nov 16, 2023 | Math | CodeCode Available | 1 |
| FinanceMath: Knowledge-Intensive Math Reasoning in Finance Domains | Nov 16, 2023 | MathMath Word Problem Solving | CodeCode Available | 1 |
| StrategyLLM: Large Language Models as Strategy Generators, Executors, Optimizers, and Evaluators for Problem Solving | Nov 15, 2023 | Math | CodeCode Available | 1 |
| Towards Reasoning in Large Language Models via Multi-Agent Peer Review Collaboration | Nov 14, 2023 | DiversityMath | CodeCode Available | 1 |
| First-Step Advantage: Importance of Starting Right in Multi-Step Math Reasoning | Nov 14, 2023 | GSM8KMath | —Unverified | 0 |
| SAIE Framework: Support Alone Isn't Enough -- Advancing LLM Training with Adversarial Remarks | Nov 14, 2023 | GSM8KMath | —Unverified | 0 |
| VerityMath: Advancing Mathematical Reasoning by Self-Verification Through Unit Consistency | Nov 13, 2023 | MathMathematical Reasoning | CodeCode Available | 0 |
| Large Language Models' Understanding of Math: Source Criticism and Extrapolation | Nov 12, 2023 | Automated Theorem ProvingMath | —Unverified | 0 |
| Let's Reinforce Step by Step | Nov 10, 2023 | GSM8KLogical Reasoning | —Unverified | 0 |
| Conic10K: A Challenging Math Problem Understanding and Reasoning Dataset | Nov 9, 2023 | MathNatural Language Understanding | CodeCode Available | 1 |
| Agent Lumos: Unified and Modular Training for Open-Source Language Agents | Nov 9, 2023 | MathQuestion Answering | CodeCode Available | 2 |
| Bias Runs Deep: Implicit Reasoning Biases in Persona-Assigned LLMs | Nov 8, 2023 | FairnessMath | CodeCode Available | 1 |
| Towards Interpretable Sequence Continuation: Analyzing Shared Circuits in Large Language Models | Nov 7, 2023 | Language ModellingLarge Language Model | CodeCode Available | 0 |
| Enhancing LLM Intelligence with ARM-RAG: Auxiliary Rationale Memory for Retrieval Augmented Generation | Nov 7, 2023 | MathRAG | —Unverified | 0 |
| ATHENA: Mathematical Reasoning with Thought Expansion | Nov 2, 2023 | MathMathematical Reasoning | CodeCode Available | 0 |
| Implicit Chain of Thought Reasoning via Knowledge Distillation | Nov 2, 2023 | Knowledge DistillationMath | CodeCode Available | 1 |
| Unleashing the Creative Mind: Language Model As Hierarchical Policy For Improved Exploration on Challenging Problem Solving | Nov 1, 2023 | In-Context LearningLanguage Modeling | CodeCode Available | 0 |
| Learning From Mistakes Makes LLM Better Reasoner | Oct 31, 2023 | GSM8KMath | CodeCode Available | 1 |
| Breaking Language Barriers in Multilingual Mathematical Reasoning: Insights and Observations | Oct 31, 2023 | GSM8KMath | CodeCode Available | 1 |
| Exploring the Reliability of Large Language Models as Customized Evaluators for Diverse NLP Tasks | Oct 30, 2023 | FairnessMath | CodeCode Available | 0 |
| math-PVS: A Large Language Model Framework to Map Scientific Publications to PVS Theories | Oct 25, 2023 | Automated Theorem ProvingLanguage Modeling | —Unverified | 0 |