| Evaluating Robustness of Reward Models for Mathematical Reasoning | Oct 2, 2024 | MathMathematical Reasoning | —Unverified | 0 |
| Scheherazade: Evaluating Chain-of-Thought Math Reasoning in LLMs with Chain-of-Problems | Sep 30, 2024 | GSM8KMath | CodeCode Available | 0 |
| INC-Math: Integrating Natural Language and Code for Enhanced Mathematical Reasoning in Large Language Models | Sep 28, 2024 | MathMathematical Reasoning | —Unverified | 0 |
| Evaluation of OpenAI o1: Opportunities and Challenges of AGI | Sep 27, 2024 | Emotion RecognitionLarge Language Model | —Unverified | 0 |
| HM3: Hierarchical Multi-Objective Model Merging for Pretrained Models | Sep 27, 2024 | Code GenerationMathematical Reasoning | —Unverified | 0 |
| Revisiting the Superficial Alignment Hypothesis | Sep 27, 2024 | Instruction FollowingMath | —Unverified | 0 |
| LLaMa-SciQ: An Educational Chatbot for Answering Science MCQ | Sep 25, 2024 | ChatbotGSM8K | —Unverified | 0 |
| ControlMath: Controllable Data Generation Promotes Math Generalist Models | Sep 20, 2024 | Data AugmentationDiversity | —Unverified | 0 |
| InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning | Sep 19, 2024 | MathMathematical Reasoning | —Unverified | 0 |
| Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement | Sep 18, 2024 | GSM8KMath | —Unverified | 0 |
| RoMath: A Mathematical Reasoning Benchmark in Romanian | Sep 17, 2024 | Mathematical Reasoning | CodeCode Available | 0 |
| Causal Inference with Large Language Model: A Survey | Sep 15, 2024 | Causal InferenceLanguage Modeling | —Unverified | 0 |
| Expediting and Elevating Large Language Model Reasoning via Hidden Chain-of-Thought Decoding | Sep 13, 2024 | Contrastive LearningLanguage Modeling | —Unverified | 0 |
| CPL: Critical Plan Step Learning Boosts LLM Generalization in Reasoning Tasks | Sep 13, 2024 | ARCCode Generation | —Unverified | 0 |
| MathGLM-Vision: Solving Mathematical Problems with Multi-Modal Large Language Model | Sep 10, 2024 | DiversityLanguage Modeling | —Unverified | 0 |
| Mathematical Formalized Problem Solving and Theorem Proving in Different Fields in Lean 4 | Sep 9, 2024 | Abstract AlgebraAutomated Theorem Proving | CodeCode Available | 0 |
| From Calculation to Adjudication: Examining LLM judges on Mathematical Reasoning Tasks | Sep 6, 2024 | Machine TranslationMathematical Reasoning | —Unverified | 0 |
| Building Math Agents with Multi-Turn Iterative Preference Learning | Sep 4, 2024 | GSM8KMath | —Unverified | 0 |
| S^3c-Math: Spontaneous Step-level Self-correction Makes Large Language Models Better Mathematical Reasoners | Sep 3, 2024 | GSM8KMath | —Unverified | 0 |
| Logic Contrastive Reasoning with Lightweight Large Language Model for Math Word Problems | Aug 29, 2024 | GSM8KLanguage Modeling | —Unverified | 0 |
| SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models | Aug 28, 2024 | Data AugmentationGSM8K | —Unverified | 0 |
| AutoGeo: Automating Geometric Image Dataset Creation for Enhanced Geometry Understanding | Aug 28, 2024 | Mathematical Reasoning | —Unverified | 0 |
| Boosting Lossless Speculative Decoding via Feature Sampling and Partial Alignment Distillation | Aug 28, 2024 | Knowledge DistillationLanguage Modelling | —Unverified | 0 |
| Tangram: Benchmark for Evaluating Geometric Element Recognition in Large Multimodal Models | Aug 25, 2024 | Mathematical Reasoning | —Unverified | 0 |
| Path-Consistency: Prefix Enhancement for Efficient Inference in LLM | Aug 25, 2024 | Code GenerationCommon Sense Reasoning | —Unverified | 0 |