| Evaluating LLMs' Mathematical and Coding Competency through Ontology-guided Interventions | Jan 17, 2024 | Arithmetic ReasoningCode Generation | CodeCode Available | 1 |
| Large Language Models Are Neurosymbolic Reasoners | Jan 17, 2024 | Common Sense ReasoningMath | CodeCode Available | 1 |
| Augmenting Math Word Problems via Iterative Question Composing | Jan 17, 2024 | MathMathematical Reasoning | CodeCode Available | 1 |
| The Benefits of a Concise Chain of Thought on Problem-Solving in Large Language Models | Jan 11, 2024 | MathMultiple-choice | CodeCode Available | 1 |
| Language Models Encode the Value of Numbers Linearly | Jan 8, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| MR-GSM8K: A Meta-Reasoning Benchmark for Large Language Model Evaluation | Dec 28, 2023 | GSM8KLanguage Model Evaluation | CodeCode Available | 1 |
| An In-depth Look at Gemini's Language Abilities | Dec 18, 2023 | Instruction FollowingMath | CodeCode Available | 1 |
| Modeling Complex Mathematical Reasoning via Large Language Model based MathAgent | Dec 14, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations | Dec 14, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 1 |
| Get an A in Math: Progressive Rectification Prompting | Dec 11, 2023 | Math | CodeCode Available | 1 |
| Is Bigger and Deeper Always Better? Probing LLaMA Across Scales and Layers | Dec 7, 2023 | MathMultiple-choice | CodeCode Available | 1 |
| Eliciting Latent Knowledge from Quirky Language Models | Dec 2, 2023 | Anomaly DetectionMath | CodeCode Available | 1 |
| MathGloss: Building mathematical glossaries from text | Nov 21, 2023 | Math | CodeCode Available | 1 |
| DocMath-Eval: Evaluating Math Reasoning Capabilities of LLMs in Understanding Long and Specialized Documents | Nov 16, 2023 | Math | CodeCode Available | 1 |
| FinanceMath: Knowledge-Intensive Math Reasoning in Finance Domains | Nov 16, 2023 | MathMath Word Problem Solving | CodeCode Available | 1 |
| StrategyLLM: Large Language Models as Strategy Generators, Executors, Optimizers, and Evaluators for Problem Solving | Nov 15, 2023 | Math | CodeCode Available | 1 |
| Towards Reasoning in Large Language Models via Multi-Agent Peer Review Collaboration | Nov 14, 2023 | DiversityMath | CodeCode Available | 1 |
| Conic10K: A Challenging Math Problem Understanding and Reasoning Dataset | Nov 9, 2023 | MathNatural Language Understanding | CodeCode Available | 1 |
| Bias Runs Deep: Implicit Reasoning Biases in Persona-Assigned LLMs | Nov 8, 2023 | FairnessMath | CodeCode Available | 1 |
| Implicit Chain of Thought Reasoning via Knowledge Distillation | Nov 2, 2023 | Knowledge DistillationMath | CodeCode Available | 1 |
| Breaking Language Barriers in Multilingual Mathematical Reasoning: Insights and Observations | Oct 31, 2023 | GSM8KMath | CodeCode Available | 1 |
| Learning From Mistakes Makes LLM Better Reasoner | Oct 31, 2023 | GSM8KMath | CodeCode Available | 1 |
| An Early Evaluation of GPT-4V(ision) | Oct 25, 2023 | Math | CodeCode Available | 1 |
| Expression Syntax Information Bottleneck for Math Word Problems | Oct 24, 2023 | Math | CodeCode Available | 1 |
| Plan, Verify and Switch: Integrated Reasoning with Diverse X-of-Thoughts | Oct 23, 2023 | Logical ReasoningMath | CodeCode Available | 1 |