| Using Large Language Model to Solve and Explain Physics Word Problems Approaching Human Level | Sep 15, 2023 | Few-Shot LearningHigh School Physics | —Unverified | 0 |
| MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning | Sep 11, 2023 | MathMathematical Reasoning | CodeCode Available | 2 |
| GPT Can Solve Mathematical Problems Without a Calculator | Sep 6, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| MathAttack: Attacking Large Language Models Towards Math Solving Ability | Sep 4, 2023 | Adversarial AttackGSM8K | —Unverified | 0 |
| Solving Math Word Problem with Problem Type Classification | Aug 26, 2023 | Answer SelectionClassification | CodeCode Available | 0 |
| WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct | Aug 18, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 5 |
| GraphReason: Enhancing Reasoning Capabilities of Large Language Models through A Graph-Based Verification Approach | Aug 18, 2023 | Math | —Unverified | 0 |
| Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification | Aug 15, 2023 | Arithmetic ReasoningMath | CodeCode Available | 2 |
| Testing GPT-4 with Wolfram Alpha and Code Interpreter plug-ins on math and science problems | Aug 10, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Towards an AI to Win Ghana's National Science and Maths Quiz | Aug 8, 2023 | MathQuestion Answering | CodeCode Available | 1 |
| NEOLAF, an LLM-powered neural-symbolic cognitive architecture | Aug 8, 2023 | Incremental LearningMath | —Unverified | 0 |
| Cumulative Reasoning with Large Language Models | Aug 8, 2023 | Decision MakingLogical Reasoning | CodeCode Available | 2 |
| Scalable and Equitable Math Problem Solving Strategy Prediction in Big Educational Data | Aug 7, 2023 | MathMisconceptions | CodeCode Available | 0 |
| Automated Distractor and Feedback Generation for Math Multiple-choice Questions via In-context Learning | Aug 7, 2023 | In-Context LearningMath | CodeCode Available | 0 |
| Studying Large Language Model Generalization with Influence Functions | Aug 7, 2023 | counterfactualLanguage Modeling | CodeCode Available | 1 |
| A Symbolic Character-Aware Model for Solving Geometry Problems | Aug 5, 2023 | MathMulti-Label Classification | CodeCode Available | 1 |
| MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities | Aug 4, 2023 | MathMM-Vet | CodeCode Available | 2 |
| Reasoning in Large Language Models Through Symbolic Math Word Problems | Aug 3, 2023 | Math | CodeCode Available | 0 |
| Skills-in-Context Prompting: Unlocking Compositionality in Large Language Models | Aug 1, 2023 | In-Context LearningMath | —Unverified | 0 |
| SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning | Aug 1, 2023 | GSM8KMath | CodeCode Available | 1 |
| Augmented Math: Authoring AR-Based Explorable Explanations by Augmenting Static Math Textbooks | Jul 30, 2023 | MathOptical Character Recognition | CodeCode Available | 0 |
| A large language model-assisted education tool to provide feedback on open-ended responses | Jul 25, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| ARB: Advanced Reasoning Benchmark for Large Language Models | Jul 25, 2023 | Math | —Unverified | 0 |
| Explaining Math Word Problem Solvers | Jul 24, 2023 | Math | —Unverified | 0 |
| Controlling Equational Reasoning in Large Language Models with Prompt Interventions | Jul 19, 2023 | HallucinationIn-Context Learning | —Unverified | 0 |
| How is ChatGPT's behavior changing over time? | Jul 18, 2023 | Code GenerationLanguage Modelling | CodeCode Available | 4 |
| A mixed policy to improve performance of language models on math problems | Jul 17, 2023 | GSM8KMath | CodeCode Available | 0 |
| Math Agents: Computational Infrastructure, Mathematical Embedding, and Genomics | Jul 4, 2023 | Automated Theorem ProvingMath | —Unverified | 0 |
| MWPRanker: An Expression Similarity Based Math Word Problem Retriever | Jul 3, 2023 | Logical SequenceMath | —Unverified | 0 |
| CMATH: Can Your Language Model Pass Chinese Elementary School Math Test? | Jun 29, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 |
| LeanDojo: Theorem Proving with Retrieval-Augmented Language Models | Jun 27, 2023 | Automated Theorem ProvingGPU | CodeCode Available | 2 |
| Let's Do a Thought Experiment: Using Counterfactuals to Improve Moral Reasoning | Jun 25, 2023 | counterfactualMath | —Unverified | 0 |
| Math Word Problem Solving by Generating Linguistic Variants of Problem Statements | Jun 24, 2023 | DecoderIngenuity | CodeCode Available | 0 |
| A Survey on Multimodal Large Language Models | Jun 23, 2023 | HallucinationIn-Context Learning | —Unverified | 0 |
| Public Attitudes Toward ChatGPT on Twitter: Sentiments, Topics, and Occupations | Jun 22, 2023 | ChatbotLanguage Modelling | CodeCode Available | 0 |
| DiversiGATE: A Comprehensive Framework for Reliable Large Language Models | Jun 22, 2023 | Arithmetic ReasoningGSM8K | —Unverified | 0 |
| Learning by Analogy: Diverse Questions Generation in Math Word Problem | Jun 15, 2023 | Math | CodeCode Available | 0 |
| SIGHT: A Large Annotated Dataset on Student Insights Gathered from Higher Education Transcripts | Jun 15, 2023 | Math | CodeCode Available | 1 |
| A Neural Network Implementation for Free Energy Principle | Jun 11, 2023 | Math | —Unverified | 0 |
| Investigating the Effectiveness of ChatGPT in Mathematical Reasoning and Problem Solving: Evidence from the Vietnamese National High School Graduation Examination | Jun 10, 2023 | MathMathematical Reasoning | —Unverified | 0 |
| PromptRobust: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts | Jun 7, 2023 | Cross-Lingual Paraphrase IdentificationMachine Translation | —Unverified | 0 |
| World Models for Math Story Problems | Jun 7, 2023 | Math | CodeCode Available | 0 |
| Is ChatGPT a Good Teacher Coach? Measuring Zero-Shot Performance For Scoring and Providing Actionable Insights on Classroom Instruction | Jun 5, 2023 | Math | CodeCode Available | 1 |
| Evaluating and Improving Tool-Augmented Computation-Intensive Math Reasoning | Jun 4, 2023 | Math | CodeCode Available | 1 |
| Does ChatGPT Comprehend the Place Value in Numbers When Solving Math Word Problems? | Jun 3, 2023 | MathMath Word Problem Solving | CodeCode Available | 0 |
| MathChat: Converse to Tackle Challenging Math Problems with LLM Agents | Jun 2, 2023 | Elementary MathematicsMath | CodeCode Available | 1 |
| Learning Multi-Step Reasoning by Solving Arithmetic Tasks | Jun 2, 2023 | MathMathematical Reasoning | CodeCode Available | 1 |
| AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration | Jun 1, 2023 | Autonomous DrivingCloud Computing | CodeCode Available | 6 |
| Inspecting Spoken Language Understanding from Kids for Basic Math Learning at Home | Jun 1, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Modeling and Analyzing Scorer Preferences in Short-Answer Math Questions | Jun 1, 2023 | Math | —Unverified | 0 |