| From Zero to Hero: Convincing with Extremely Complicated Math | Apr 1, 2023 | Math | CodeCode Available | 1 | 5 |
| Nerva: a Truly Sparse Implementation of Neural Networks | Jul 24, 2024 | Math | CodeCode Available | 1 | 5 |
| Fine-Tuning Large Language Models on Quantum Optimization Problems for Circuit Generation | Apr 15, 2025 | MathQuantum Machine Learning | CodeCode Available | 1 | 5 |
| A*-Thought: Efficient Reasoning via Bidirectional Compression for Low-Resource Settings | May 30, 2025 | Math | CodeCode Available | 1 | 5 |
| MWPToolkit: An Open-Source Framework for Deep Learning-Based Math Word Problem Solvers | Sep 2, 2021 | MathMath Word Problem Solving | CodeCode Available | 1 | 5 |
| Conic10K: A Challenging Math Problem Understanding and Reasoning Dataset | Nov 9, 2023 | MathNatural Language Understanding | CodeCode Available | 1 | 5 |
| Natural Language Embedded Programs for Hybrid Language Symbolic Reasoning | Sep 19, 2023 | Instruction FollowingLanguage Modeling | CodeCode Available | 1 | 5 |
| Harnessing Negative Signals: Reinforcement Distillation from Teacher Data for LLM Reasoning | May 30, 2025 | MathMathematical Reasoning | CodeCode Available | 1 | 5 |
| FELM: Benchmarking Factuality Evaluation of Large Language Models | Oct 1, 2023 | BenchmarkingMath | CodeCode Available | 1 | 5 |
| A Diverse Corpus for Evaluating and Developing English Math Word Problem Solvers | Jun 30, 2021 | DiversityMath | CodeCode Available | 1 | 5 |
| FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle Solving | Feb 27, 2025 | GSM8KMath | CodeCode Available | 1 | 5 |
| Hierarchical Prompting Taxonomy: A Universal Evaluation Framework for Large Language Models Aligned with Human Cognitive Principles | Jun 18, 2024 | Arithmetic ReasoningCode Generation | CodeCode Available | 1 | 5 |
| A Causal Framework to Quantify the Robustness of Mathematical Reasoning with Language Models | Oct 21, 2022 | MathMathematical Reasoning | CodeCode Available | 1 | 5 |
| Forgotten Polygons: Multimodal Large Language Models are Shape-Blind | Feb 21, 2025 | MathMathematical Problem-Solving | CodeCode Available | 1 | 5 |
| JiuZhang3.0: Efficiently Improving Mathematical Reasoning by Training Small Data Synthesis Models | May 23, 2024 | Knowledge DistillationMath | CodeCode Available | 1 | 5 |
| Plan, Verify and Switch: Integrated Reasoning with Diverse X-of-Thoughts | Oct 23, 2023 | Logical ReasoningMath | CodeCode Available | 1 | 5 |
| NeMo-Inspector: A Visualization Tool for LLM Generation Analysis | May 1, 2025 | GSM8KMath | CodeCode Available | 1 | 5 |
| Non-Autoregressive Math Word Problem Solver with Unified Tree Structure | May 8, 2023 | Mathvalid | CodeCode Available | 1 | 5 |
| EXAONE Deep: Reasoning Enhanced Language Models | Mar 16, 2025 | Math | CodeCode Available | 1 | 5 |
| Is ChatGPT a Good Teacher Coach? Measuring Zero-Shot Performance For Scoring and Providing Actionable Insights on Classroom Instruction | Jun 5, 2023 | Math | CodeCode Available | 1 | 5 |
| ConceptMath: A Bilingual Concept-wise Benchmark for Measuring Mathematical Reasoning of Large Language Models | Feb 22, 2024 | MathMathematical Reasoning | CodeCode Available | 1 | 5 |
| Explaining Datasets in Words: Statistical Models with Natural Language Parameters | Sep 13, 2024 | ClusteringLanguage Modeling | CodeCode Available | 1 | 5 |
| Multiple-Choice Questions are Efficient and Robust LLM Evaluators | May 20, 2024 | GSM8KHumanEval | CodeCode Available | 1 | 5 |
| Evolving Prompts In-Context: An Open-ended, Self-replicating Perspective | Jun 22, 2025 | In-Context LearningLarge Language Model | CodeCode Available | 1 | 5 |
| Modeling Complex Mathematical Reasoning via Large Language Model based MathAgent | Dec 14, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| ModelingAgent: Bridging LLMs and Mathematical Modeling for Real-World Challenges | May 21, 2025 | Mathvalid | CodeCode Available | 1 | 5 |
| MultiMath: Bridging Visual and Mathematical Reasoning for Large Language Models | Aug 30, 2024 | Image CaptioningLanguage Modeling | CodeCode Available | 1 | 5 |
| A Relation Spectrum Inheriting Taylor Series: Muscle Synergy and Coupling for Hand | Apr 25, 2020 | MathRelation | CodeCode Available | 1 | 5 |
| A Symbolic Character-Aware Model for Solving Geometry Problems | Aug 5, 2023 | MathMulti-Label Classification | CodeCode Available | 1 | 5 |
| Evaluating and Improving Tool-Augmented Computation-Intensive Math Reasoning | Jun 4, 2023 | Math | CodeCode Available | 1 | 5 |
| Mathfish: Evaluating Language Model Math Reasoning via Grounding in Educational Curricula | Aug 8, 2024 | GSM8KLanguage Modeling | CodeCode Available | 1 | 5 |
| EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees | Mar 11, 2025 | ChatbotLanguage Modeling | CodeCode Available | 1 | 5 |
| CoMAT: Chain of Mathematically Annotated Thought Improves Mathematical Reasoning | Oct 14, 2024 | MathMathematical Reasoning | CodeCode Available | 1 | 5 |
| Collective Constitutional AI: Aligning a Language Model with Public Input | Jun 12, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| A Categorical Archive of ChatGPT Failures | Feb 6, 2023 | Math | CodeCode Available | 1 | 5 |
| MUSTARD: Mastering Uniform Synthesis of Theorem and Proof Data | Feb 14, 2024 | Automated Theorem ProvingLanguage Modelling | CodeCode Available | 1 | 5 |
| Entropy-Regularized Process Reward Model | Dec 15, 2024 | GSM8KMath | CodeCode Available | 1 | 5 |
| Memory-Efficient and Secure DNN Inference on TrustZone-enabled Consumer IoT Devices | Mar 19, 2024 | Math | CodeCode Available | 1 | 5 |
| Code-Vision: Evaluating Multimodal LLMs Logic Understanding and Code Generation Capabilities | Feb 17, 2025 | Code GenerationHumanEval | CodeCode Available | 1 | 5 |
| MedCaseReasoning: Evaluating and learning diagnostic reasoning from clinical case reports | May 16, 2025 | DiagnosticMath | CodeCode Available | 1 | 5 |
| Entropy-Based Adaptive Weighting for Self-Training | Mar 31, 2025 | GSM8KMath | CodeCode Available | 1 | 5 |
| Enhancing Cross-Tokenizer Knowledge Distillation with Contextual Dynamical Mapping | Feb 16, 2025 | Code GenerationInstruction Following | CodeCode Available | 1 | 5 |
| Language Models Encode the Value of Numbers Linearly | Jan 8, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| Large Language Models Are Latent Variable Models: Explaining and Finding Good Demonstrations for In-Context Learning | Jan 27, 2023 | Few-Shot LearningGSM8K | CodeCode Available | 1 | 5 |
| Eliminating Position Bias of Language Models: A Mechanistic Approach | Jul 1, 2024 | Mathobject-detection | CodeCode Available | 1 | 5 |
| Large Language Models Can Be Easily Distracted by Irrelevant Context | Jan 31, 2023 | Arithmetic ReasoningLanguage Modeling | CodeCode Available | 1 | 5 |
| Large (Vision) Language Models are Unsupervised In-Context Learners | Apr 3, 2025 | GSM8KIn-Context Learning | CodeCode Available | 1 | 5 |
| Discovering Mathematical Objects of Interest -- A Study of Mathematical Notations | Feb 7, 2020 | Information RetrievalMath | CodeCode Available | 1 | 5 |
| Escape Sky-high Cost: Early-stopping Self-Consistency for Multi-step Reasoning | Jan 19, 2024 | GSM8KMath | CodeCode Available | 1 | 5 |
| Efficient Reasoning for LLMs through Speculative Chain-of-Thought | Apr 27, 2025 | GSM8KMath | CodeCode Available | 1 | 5 |