| An Early Evaluation of GPT-4V(ision) | Oct 25, 2023 | Math | CodeCode Available | 1 |
| Expression Syntax Information Bottleneck for Math Word Problems | Oct 24, 2023 | Math | CodeCode Available | 1 |
| Plan, Verify and Switch: Integrated Reasoning with Diverse X-of-Thoughts | Oct 23, 2023 | Logical ReasoningMath | CodeCode Available | 1 |
| We are Who We Cite: Bridges of Influence Between Natural Language Processing and Other Academic Fields | Oct 23, 2023 | DiversityMath | CodeCode Available | 0 |
| Teaching Language Models to Self-Improve through Interactive Demonstrations | Oct 20, 2023 | Math | CodeCode Available | 1 |
| SEGO: Sequential Subgoal Optimization for Mathematical Problem-Solving | Oct 19, 2023 | GSM8KMath | CodeCode Available | 0 |
| Llemma: An Open Language Model For Mathematics | Oct 16, 2023 | Arithmetic ReasoningAutomated Theorem Proving | CodeCode Available | 3 |
| Bridging the Novice-Expert Gap via Models of Decision-Making: A Case Study on Remediating Math Mistakes | Oct 16, 2023 | Decision MakingMath | CodeCode Available | 1 |
| Let's reward step by step: Step-Level reward model as the Navigators for Reasoning | Oct 16, 2023 | Code GenerationGSM8K | —Unverified | 0 |
| Improving Large Language Model Fine-tuning for Solving Math Problems | Oct 16, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Solving Math Word Problems with Reexamination | Oct 14, 2023 | DescriptiveMath | CodeCode Available | 0 |
| An Expression Tree Decoding Strategy for Mathematical Equation Generation | Oct 14, 2023 | MathMathematical Reasoning | CodeCode Available | 2 |
| The Search-and-Mix Paradigm in Approximate Nash Equilibrium Algorithms | Oct 12, 2023 | Math | —Unverified | 0 |
| LLMs as Potential Brainstorming Partners for Math and Science Problems | Oct 10, 2023 | Math | —Unverified | 0 |
| Don't Fine-Tune, Decode: Syntax Error-Free Tool Use via Constrained Decoding | Oct 10, 2023 | Mathvalid | CodeCode Available | 1 |
| Mistral 7B | Oct 10, 2023 | answerability predictionArithmetic Reasoning | CodeCode Available | 6 |
| MuggleMath: Assessing the Impact of Query and Response Augmentation on Math Reasoning | Oct 9, 2023 | Arithmetic ReasoningData Augmentation | CodeCode Available | 2 |
| How Abilities in Large Language Models are Affected by Supervised Fine-tuning Data Composition | Oct 9, 2023 | Code GenerationInstruction Following | CodeCode Available | 3 |
| Guiding Language Model Reasoning with Planning Tokens | Oct 9, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Resprompt: Residual Connection Prompting Advances Multi-Step Reasoning in Large Language Models | Oct 7, 2023 | Math | —Unverified | 0 |
| Critique Ability of Large Language Models | Oct 7, 2023 | Code CompletionDecision Making | —Unverified | 0 |
| Analysis of the Reasoning with Redundant Information Provided Ability of Large Language Models | Oct 6, 2023 | 8kMath | —Unverified | 0 |
| Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models | Oct 6, 2023 | Code GenerationDecision Making | CodeCode Available | 2 |
| DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines | Oct 5, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 7 |
| MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning | Oct 5, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 2 |
| Concise and Organized Perception Facilitates Reasoning in Large Language Models | Oct 5, 2023 | LAMBADAMath | —Unverified | 0 |
| Retrieval-augmented Generation to Improve Math Question-Answering: Trade-offs Between Groundedness and Human Preference | Oct 4, 2023 | MathQuestion Answering | CodeCode Available | 1 |
| The Rise of Open Science: Tracking the Evolution and Perceived Value of Data and Methods Link-Sharing Practices | Oct 4, 2023 | ArticlesMath | CodeCode Available | 0 |
| Novice Learner and Expert Tutor: Evaluating Math Reasoning Abilities of Large Language Models with Misconceptions | Oct 3, 2023 | MathMathematical Reasoning | —Unverified | 0 |
| Instances Need More Care: Rewriting Prompts for Instances with LLMs in the Loop Yields Better Zero-Shot Performance | Oct 3, 2023 | Code GenerationLogical Reasoning | CodeCode Available | 0 |
| Large Language Models as Analogical Reasoners | Oct 3, 2023 | Code GenerationGSM8K | —Unverified | 0 |
| Benchmarking and Improving Generator-Validator Consistency of Language Models | Oct 3, 2023 | BenchmarkingInstruction Following | —Unverified | 0 |
| SNIP: Bridging Mathematical Symbolic and Numeric Realms with Unified Pre-training | Oct 3, 2023 | Contrastive LearningEquation Discovery | CodeCode Available | 1 |
| MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts | Oct 3, 2023 | ChatbotImage Captioning | CodeCode Available | 2 |
| A Dynamic LLM-Powered Agent Network for Task-Oriented Agent Collaboration | Oct 3, 2023 | Arithmetic ReasoningCode Generation | CodeCode Available | 1 |
| Fill in the Blank: Exploring and Enhancing LLM Capabilities for Backward Reasoning in Math Word Problems | Oct 3, 2023 | GSM8KMath | CodeCode Available | 0 |
| FELM: Benchmarking Factuality Evaluation of Large Language Models | Oct 1, 2023 | BenchmarkingMath | CodeCode Available | 1 |
| Investigating the Efficacy of Large Language Models in Reflective Assessment Methods through Chain of Thoughts Prompting | Sep 30, 2023 | Math | —Unverified | 0 |
| ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving | Sep 29, 2023 | Arithmetic ReasoningComputational Efficiency | CodeCode Available | 3 |
| L2CEval: Evaluating Language-to-Code Generation Capabilities of Large Language Models | Sep 29, 2023 | Code GenerationMath | —Unverified | 0 |
| Qwen Technical Report | Sep 28, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 6 |
| NLPBench: Evaluating Large Language Models on Solving NLP Problems | Sep 27, 2023 | BenchmarkingMath | CodeCode Available | 1 |
| ReConcile: Round-Table Conference Improves Reasoning via Consensus among Diverse LLMs | Sep 22, 2023 | Math | CodeCode Available | 2 |
| MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models | Sep 21, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 2 |
| Fairness Hub Technical Briefs: AUC Gap | Sep 20, 2023 | FairnessMath | —Unverified | 0 |
| Design of Chain-of-Thought in Math Problem Solving | Sep 20, 2023 | DiversityGSM8K | CodeCode Available | 1 |
| Natural Language Embedded Programs for Hybrid Language Symbolic Reasoning | Sep 19, 2023 | Instruction FollowingLanguage Modeling | CodeCode Available | 1 |
| Contrastive Decoding Improves Reasoning in Large Language Models | Sep 17, 2023 | GSM8KHellaSwag | —Unverified | 0 |
| Odd period cycles and ergodic properties in price dynamics for an exchange economy | Sep 17, 2023 | Math | —Unverified | 0 |
| ChatGPT-4 with Code Interpreter can be used to solve introductory college-level vector calculus and electromagnetism problems | Sep 16, 2023 | Electrical EngineeringMath | —Unverified | 0 |