| Interpretable Math Word Problem Solution Generation Via Step-by-step Planning | Jun 1, 2023 | GSM8KLanguage Modeling | —Unverified | 0 |
| Quantitative Methods for Optimizing Patient Outcomes in Liver Transplantation | May 31, 2023 | ManagementMath | —Unverified | 0 |
| Let's Verify Step by Step | May 31, 2023 | Active LearningMath | CodeCode Available | 4 |
| Chatbots put to the test in math and logic problems: A preliminary comparison and assessment of ChatGPT-3.5, ChatGPT-4, and Google Bard | May 30, 2023 | ChatbotMath | —Unverified | 0 |
| Leveraging Training Data in Few-Shot Prompting for Numerical Reasoning | May 29, 2023 | Language ModellingLarge Language Model | CodeCode Available | 0 |
| Towards Revealing the Mystery behind Chain of Thought: A Theoretical Perspective | May 24, 2023 | Decision MakingMath | —Unverified | 0 |
| Emergent inabilities? Inverse scaling over the course of pretraining | May 24, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 |
| GRACE: Discriminator-Guided Chain-of-Thought Reasoning | May 24, 2023 | GSM8KMath | CodeCode Available | 1 |
| Complex Mathematical Symbol Definition Structures: A Dataset and Model for Coordination Resolution in Definition Extraction | May 24, 2023 | Definition ExtractionMath | CodeCode Available | 0 |
| Reasoning with Language Model is Planning with World Model | May 24, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 4 |
| Unlocking Temporal Question Answering for Large Language Models with Tailor-Made Reasoning Logic | May 24, 2023 | Logical ReasoningMath | CodeCode Available | 0 |
| The Art of SOCRATIC QUESTIONING: Recursive Thinking with Large Language Models | May 24, 2023 | Language ModellingMath | CodeCode Available | 1 |
| RSRM: Reinforcement Symbolic Regression Machine | May 24, 2023 | MathQ-Learning | —Unverified | 0 |
| MathDial: A Dialogue Tutoring Dataset with Rich Pedagogical Properties Grounded in Math Reasoning Problems | May 23, 2023 | Language ModellingLarge Language Model | CodeCode Available | 1 |
| RetICL: Sequential Retrieval of In-Context Examples with Reinforcement Learning | May 23, 2023 | In-Context LearningLanguage Modelling | CodeCode Available | 1 |
| ChatCoT: Tool-Augmented Chain-of-Thought Reasoning on Chat-based Large Language Models | May 23, 2023 | Math | CodeCode Available | 1 |
| CREATOR: Tool Creation for Disentangling Abstract and Concrete Reasoning of Large Language Models | May 23, 2023 | 2kMath | CodeCode Available | 1 |
| Cognitive network science reveals bias in GPT-3, ChatGPT, and GPT-4 mirroring math anxiety in high-school students | May 22, 2023 | MathText Generation | —Unverified | 0 |
| Let GPT be a Math Tutor: Teaching Math Word Problem Solvers with Customized Exercise Generation | May 22, 2023 | Knowledge TracingMath | —Unverified | 0 |
| Can ChatGPT Defend its Belief in Truth? Evaluating LLM Reasoning via Debate | May 22, 2023 | BenchmarkingMath | —Unverified | 0 |
| TEIMMA: The First Content Reuse Annotator for Text, Images, and Math | May 22, 2023 | Math | CodeCode Available | 0 |
| TheoremQA: A Theorem-driven Question Answering dataset | May 21, 2023 | MathQuestion Answering | CodeCode Available | 1 |
| Hint of Thought prompting: an explainable and zero-shot approach to reasoning tasks with LLMs | May 19, 2023 | Arithmetic ReasoningGSM8K | —Unverified | 0 |
| A quantitative study of NLP approaches to question difficulty estimation | May 17, 2023 | MathMultiple-choice | CodeCode Available | 0 |
| Learning Non-linguistic Skills without Sacrificing Linguistic Proficiency | May 14, 2023 | Arithmetic ReasoningMath | CodeCode Available | 0 |
| CodeT5+: Open Code Large Language Models for Code Understanding and Generation | May 13, 2023 | Arithmetic ReasoningCode Completion | CodeCode Available | 0 |
| Parameterized Approximation for Robust Clustering in Discrete Geometric Spaces | May 12, 2023 | ClusteringFairness | —Unverified | 0 |
| Algebra Error Classification with Large Language Models | May 8, 2023 | ClassificationMath | CodeCode Available | 0 |
| Non-Autoregressive Math Word Problem Solver with Unified Tree Structure | May 8, 2023 | Mathvalid | CodeCode Available | 1 |
| Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models | May 6, 2023 | Math | CodeCode Available | 7 |
| AI, write an essay for me: A large-scale comparison of human-written versus ChatGPT-generated essays | Apr 24, 2023 | Math | —Unverified | 0 |
| Who's the Best Detective? LLMs vs. MLs in Detecting Incoherent Fourth Grade Math Answers | Apr 21, 2023 | MathMultiple-choice | —Unverified | 0 |
| Progressive-Hint Prompting Improves Reasoning in Large Language Models | Apr 19, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 2 |
| Enhancing Textbooks with Visuals from the Web for Improved Learning | Apr 18, 2023 | Math | CodeCode Available | 0 |
| Metric-agnostic Ranking Optimization | Apr 17, 2023 | Information RetrievalLearning-To-Rank | —Unverified | 0 |
| What Makes a Good Dataset for Symbol Description Reading? | Apr 17, 2023 | document understandingMath | —Unverified | 0 |
| Solving Math Word Problems by Combining Language Models With Symbolic Solvers | Apr 16, 2023 | GSM8KLanguage Modeling | CodeCode Available | 1 |
| Gamifying Math Education using Object Detection | Apr 13, 2023 | MathObject | —Unverified | 0 |
| AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models | Apr 13, 2023 | Decision MakingMath | CodeCode Available | 2 |
| Reinforcement Learning Tutor Better Supported Lower Performers in a Math Task | Apr 11, 2023 | Deep Reinforcement LearningExplainable artificial intelligence | —Unverified | 0 |
| From Zero to Hero: Convincing with Extremely Complicated Math | Apr 1, 2023 | Math | CodeCode Available | 1 |
| Exploring the Impact of Instruction Data Scaling on Large Language Models: An Empirical Study on Real-World Use Cases | Mar 26, 2023 | Math | —Unverified | 0 |
| Reliable and Efficient Evaluation of Adversarial Robustness for Deep Hashing-Based Retrieval | Mar 22, 2023 | Adversarial RobustnessDeep Hashing | —Unverified | 0 |
| Mind meets machine: Unravelling GPT-4's cognitive psychology | Mar 20, 2023 | Common Sense ReasoningDecision Making | —Unverified | 0 |
| OntoMath^PRO 2.0 Ontology: Updates of the Formal Model | Mar 17, 2023 | ManagementMath | —Unverified | 0 |
| How well do Large Language Models perform in Arithmetic tasks? | Mar 16, 2023 | Math | CodeCode Available | 1 |
| GPT-4 Technical Report | Mar 15, 2023 | answerability predictionArithmetic Reasoning | CodeCode Available | 6 |
| SALSA PICANTE: a machine learning attack on LWE with binary secrets | Mar 7, 2023 | Math | CodeCode Available | 1 |
| Self-reinforced polynomial approximation methods for concentrated probability densities | Mar 5, 2023 | Math | —Unverified | 0 |
| MathPrompter: Mathematical Reasoning using Large Language Models | Mar 4, 2023 | Arithmetic ReasoningMath | CodeCode Available | 1 |