| Instances Need More Care: Rewriting Prompts for Instances with LLMs in the Loop Yields Better Zero-Shot Performance | Oct 3, 2023 | Code GenerationLogical Reasoning | CodeCode Available | 0 |
| Benchmarking and Improving Generator-Validator Consistency of Language Models | Oct 3, 2023 | BenchmarkingInstruction Following | —Unverified | 0 |
| Novice Learner and Expert Tutor: Evaluating Math Reasoning Abilities of Large Language Models with Misconceptions | Oct 3, 2023 | MathMathematical Reasoning | —Unverified | 0 |
| Fill in the Blank: Exploring and Enhancing LLM Capabilities for Backward Reasoning in Math Word Problems | Oct 3, 2023 | GSM8KMath | CodeCode Available | 0 |
| Investigating the Efficacy of Large Language Models in Reflective Assessment Methods through Chain of Thoughts Prompting | Sep 30, 2023 | Math | —Unverified | 0 |
| L2CEval: Evaluating Language-to-Code Generation Capabilities of Large Language Models | Sep 29, 2023 | Code GenerationMath | —Unverified | 0 |
| Fairness Hub Technical Briefs: AUC Gap | Sep 20, 2023 | FairnessMath | —Unverified | 0 |
| Contrastive Decoding Improves Reasoning in Large Language Models | Sep 17, 2023 | GSM8KHellaSwag | —Unverified | 0 |
| Odd period cycles and ergodic properties in price dynamics for an exchange economy | Sep 17, 2023 | Math | —Unverified | 0 |
| ChatGPT-4 with Code Interpreter can be used to solve introductory college-level vector calculus and electromagnetism problems | Sep 16, 2023 | Electrical EngineeringMath | —Unverified | 0 |
| Using Large Language Model to Solve and Explain Physics Word Problems Approaching Human Level | Sep 15, 2023 | Few-Shot LearningHigh School Physics | —Unverified | 0 |
| MathAttack: Attacking Large Language Models Towards Math Solving Ability | Sep 4, 2023 | Adversarial AttackGSM8K | —Unverified | 0 |
| Solving Math Word Problem with Problem Type Classification | Aug 26, 2023 | Answer SelectionClassification | CodeCode Available | 0 |
| GraphReason: Enhancing Reasoning Capabilities of Large Language Models through A Graph-Based Verification Approach | Aug 18, 2023 | Math | —Unverified | 0 |
| Testing GPT-4 with Wolfram Alpha and Code Interpreter plug-ins on math and science problems | Aug 10, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 |
| NEOLAF, an LLM-powered neural-symbolic cognitive architecture | Aug 8, 2023 | Incremental LearningMath | —Unverified | 0 |
| Scalable and Equitable Math Problem Solving Strategy Prediction in Big Educational Data | Aug 7, 2023 | MathMisconceptions | CodeCode Available | 0 |
| Automated Distractor and Feedback Generation for Math Multiple-choice Questions via In-context Learning | Aug 7, 2023 | In-Context LearningMath | CodeCode Available | 0 |
| Reasoning in Large Language Models Through Symbolic Math Word Problems | Aug 3, 2023 | Math | CodeCode Available | 0 |
| Skills-in-Context Prompting: Unlocking Compositionality in Large Language Models | Aug 1, 2023 | In-Context LearningMath | —Unverified | 0 |
| Augmented Math: Authoring AR-Based Explorable Explanations by Augmenting Static Math Textbooks | Jul 30, 2023 | MathOptical Character Recognition | CodeCode Available | 0 |
| A large language model-assisted education tool to provide feedback on open-ended responses | Jul 25, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| ARB: Advanced Reasoning Benchmark for Large Language Models | Jul 25, 2023 | Math | —Unverified | 0 |
| Explaining Math Word Problem Solvers | Jul 24, 2023 | Math | —Unverified | 0 |
| Controlling Equational Reasoning in Large Language Models with Prompt Interventions | Jul 19, 2023 | HallucinationIn-Context Learning | —Unverified | 0 |
| A mixed policy to improve performance of language models on math problems | Jul 17, 2023 | GSM8KMath | CodeCode Available | 0 |
| Math Agents: Computational Infrastructure, Mathematical Embedding, and Genomics | Jul 4, 2023 | Automated Theorem ProvingMath | —Unverified | 0 |
| MWPRanker: An Expression Similarity Based Math Word Problem Retriever | Jul 3, 2023 | Logical SequenceMath | —Unverified | 0 |
| CMATH: Can Your Language Model Pass Chinese Elementary School Math Test? | Jun 29, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Let's Do a Thought Experiment: Using Counterfactuals to Improve Moral Reasoning | Jun 25, 2023 | counterfactualMath | —Unverified | 0 |
| Math Word Problem Solving by Generating Linguistic Variants of Problem Statements | Jun 24, 2023 | DecoderIngenuity | CodeCode Available | 0 |
| A Survey on Multimodal Large Language Models | Jun 23, 2023 | HallucinationIn-Context Learning | —Unverified | 0 |
| Public Attitudes Toward ChatGPT on Twitter: Sentiments, Topics, and Occupations | Jun 22, 2023 | ChatbotLanguage Modelling | CodeCode Available | 0 |
| DiversiGATE: A Comprehensive Framework for Reliable Large Language Models | Jun 22, 2023 | Arithmetic ReasoningGSM8K | —Unverified | 0 |
| Learning by Analogy: Diverse Questions Generation in Math Word Problem | Jun 15, 2023 | Math | CodeCode Available | 0 |
| A Neural Network Implementation for Free Energy Principle | Jun 11, 2023 | Math | —Unverified | 0 |
| Investigating the Effectiveness of ChatGPT in Mathematical Reasoning and Problem Solving: Evidence from the Vietnamese National High School Graduation Examination | Jun 10, 2023 | MathMathematical Reasoning | —Unverified | 0 |
| PromptRobust: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts | Jun 7, 2023 | Cross-Lingual Paraphrase IdentificationMachine Translation | —Unverified | 0 |
| World Models for Math Story Problems | Jun 7, 2023 | Math | CodeCode Available | 0 |
| Does ChatGPT Comprehend the Place Value in Numbers When Solving Math Word Problems? | Jun 3, 2023 | MathMath Word Problem Solving | CodeCode Available | 0 |
| Interpretable Math Word Problem Solution Generation Via Step-by-step Planning | Jun 1, 2023 | GSM8KLanguage Modeling | —Unverified | 0 |
| Modeling and Analyzing Scorer Preferences in Short-Answer Math Questions | Jun 1, 2023 | Math | —Unverified | 0 |
| Inspecting Spoken Language Understanding from Kids for Basic Math Learning at Home | Jun 1, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Quantitative Methods for Optimizing Patient Outcomes in Liver Transplantation | May 31, 2023 | ManagementMath | —Unverified | 0 |
| Chatbots put to the test in math and logic problems: A preliminary comparison and assessment of ChatGPT-3.5, ChatGPT-4, and Google Bard | May 30, 2023 | ChatbotMath | —Unverified | 0 |
| Leveraging Training Data in Few-Shot Prompting for Numerical Reasoning | May 29, 2023 | Language ModellingLarge Language Model | CodeCode Available | 0 |
| Emergent inabilities? Inverse scaling over the course of pretraining | May 24, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Complex Mathematical Symbol Definition Structures: A Dataset and Model for Coordination Resolution in Definition Extraction | May 24, 2023 | Definition ExtractionMath | CodeCode Available | 0 |
| RSRM: Reinforcement Symbolic Regression Machine | May 24, 2023 | MathQ-Learning | —Unverified | 0 |
| Towards Revealing the Mystery behind Chain of Thought: A Theoretical Perspective | May 24, 2023 | Decision MakingMath | —Unverified | 0 |