| Evaluating GPT-4 at Grading Handwritten Solutions in Math Exams | Nov 7, 2024 | Math | —Unverified | 0 |
| Meta-Reasoning Improves Tool Use in Large Language Models | Nov 7, 2024 | Math | CodeCode Available | 0 |
| Self-Consistency Preference Optimization | Nov 6, 2024 | GSM8KMath | —Unverified | 0 |
| Automatic Generation of Question Hints for Mathematics Problems using Large Language Models in Educational Technology | Nov 5, 2024 | MathMisconceptions | —Unverified | 0 |
| Leveraging Label Semantics and Meta-Label Refinement for Multi-Label Question Classification | Nov 4, 2024 | MathReranking | CodeCode Available | 0 |
| Regress, Don't Guess -- A Regression-like Loss on Number Tokens for Language Models | Nov 4, 2024 | Inductive BiasLanguage Modeling | CodeCode Available | 1 |
| Dictionary Insertion Prompting for Multilingual Reasoning on Multilingual Large Language Models | Nov 2, 2024 | GSM8KMath | —Unverified | 0 |
| STEM-POM: Evaluating Language Models Math-Symbol Reasoning in Document Parsing | Nov 1, 2024 | 2kIn-Context Learning | —Unverified | 0 |
| DynaMath: A Dynamic Visual Benchmark for Evaluating Mathematical Reasoning Robustness of Vision Language Models | Oct 29, 2024 | MathMathematical Reasoning | —Unverified | 0 |
| Improving Math Problem Solving in Large Language Models Through Categorization and Strategy Tailoring | Oct 29, 2024 | Math | —Unverified | 0 |