| ProcessBench: Identifying Process Errors in Mathematical Reasoning | Dec 9, 2024 | GSM8KMath | CodeCode Available | 2 |
| When Dimensionality Reduction Meets Graph (Drawing) Theory: Introducing a Common Framework, Challenges and Opportunities | Dec 9, 2024 | Dimensionality ReductionMath | —Unverified | 0 |
| Chimera: Improving Generalist Model with Domain-Specific Experts | Dec 8, 2024 | Mathmodel | —Unverified | 0 |
| Neuro-Symbolic Data Generation for Math Reasoning | Dec 6, 2024 | DiversityMath | —Unverified | 0 |
| Hard Math -- Easy UVM: Pragmatic solutions for verifying hardware algorithms using UVM | Dec 6, 2024 | Math | —Unverified | 0 |
| Enhancing Mathematical Reasoning in LLMs with Background Operators | Dec 5, 2024 | Data AugmentationMath | —Unverified | 0 |
| Automated LaTeX Code Generation from Handwritten Math Expressions Using Vision Transformer | Dec 5, 2024 | Code GenerationDecoder | —Unverified | 0 |
| RedStone: Curating General, Code, Math, and QA Data for Large Language Models | Dec 4, 2024 | Domain AdaptationMath | —Unverified | 0 |
| U-MATH: A University-Level Benchmark for Evaluating Mathematical Skills in LLMs | Dec 4, 2024 | DiversityMath | CodeCode Available | 1 |
| Unsupervised learning-based calibration scheme for Rough Bergomi model | Dec 3, 2024 | Math | CodeCode Available | 0 |
| Free Process Rewards without Process Labels | Dec 2, 2024 | Math | CodeCode Available | 5 |
| MALT: Improving Reasoning with Multi-Agent LLM Training | Dec 2, 2024 | Common Sense ReasoningGSM8K | —Unverified | 0 |
| Yi-Lightning Technical Report | Dec 2, 2024 | ChatbotLarge Language Model | —Unverified | 0 |
| Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM's Reasoning Capability | Nov 29, 2024 | GSM8KMath | CodeCode Available | 1 |
| Reverse Thinking Makes LLMs Stronger Reasoners | Nov 29, 2024 | Data AugmentationKnowledge Distillation | —Unverified | 0 |
| A Lean Dataset for International Math Olympiad: Small Steps towards Writing Math Proofs for Hard Problems | Nov 28, 2024 | LEMMAMath | —Unverified | 0 |
| Mars-PO: Multi-Agent Reasoning System Preference Optimization | Nov 28, 2024 | MathMathematical Reasoning | —Unverified | 0 |
| Embracing AI in Education: Understanding the Surge in Large Language Model Use by Secondary Students | Nov 27, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Beyond Examples: High-level Automated Reasoning Paradigm in In-Context Learning via MCTS | Nov 27, 2024 | In-Context LearningMath | CodeCode Available | 0 |
| Training and Evaluating Language Models with Template-based Data Generation | Nov 27, 2024 | Data AugmentationMath | CodeCode Available | 1 |
| Preference Optimization for Reasoning with Pseudo Feedback | Nov 25, 2024 | GSM8KMath | CodeCode Available | 2 |
| Unraveling Arithmetic in Large Language Models: The Role of Algebraic Structures | Nov 25, 2024 | GSM8KMath | —Unverified | 0 |
| Learning by Analogy: Enhancing Few-Shot Prompting for Math Word Problem Solving with Computational Graph-Based Retrieval | Nov 25, 2024 | MathMath Word Problem Solving | —Unverified | 0 |
| LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training | Nov 24, 2024 | MathMixture-of-Experts | CodeCode Available | 2 |
| Velocitune: A Velocity-based Dynamic Domain Reweighting Method for Continual Pre-training | Nov 21, 2024 | Math | —Unverified | 0 |