| Pheromone-based Learning of Optimal Reasoning Paths | Jan 31, 2025 | ARCGSM8K | —Unverified | 0 |
| PixelWorld: Towards Perceiving Everything as Pixels | Jan 31, 2025 | Math | —Unverified | 0 |
| Fairshare Data Pricing via Data Valuation for Large Language Models | Jan 31, 2025 | Data ValuationMath | —Unverified | 0 |
| Token-Hungry, Yet Precise: DeepSeek R1 Highlights the Need for Multi-Step Reasoning Over Speed in MATH | Jan 30, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Examining the Robustness of Large Language Models across Language Complexity | Jan 30, 2025 | Math | —Unverified | 0 |
| Token-by-Token Regeneration and Domain Biases: A Benchmark of LLMs on Advanced Mathematical Problem-Solving | Jan 28, 2025 | MathMathematical Problem-Solving | —Unverified | 0 |
| Error Classification of Large Language Models on Math Word Problems: A Dynamically Adaptive Framework | Jan 26, 2025 | MathMathematical Reasoning | —Unverified | 0 |
| Clear Preferences Leave Traces: Reference Model-Guided Sampling for Preference Learning | Jan 25, 2025 | Math | —Unverified | 0 |
| DrawEduMath: Evaluating Vision Language Models with Expert-Annotated Students' Hand-Drawn Math Images | Jan 24, 2025 | Math | —Unverified | 0 |
| Advancing Mathematical Reasoning in Language Models: The Impact of Problem-Solving Data, Data Synthesis Methods, and Training Stages | Jan 23, 2025 | Instruction FollowingMath | —Unverified | 0 |
| Is your LLM trapped in a Mental Set? Investigative study on how mental sets affect the reasoning capabilities of LLMs | Jan 21, 2025 | GSM8KIn-Context Learning | —Unverified | 0 |
| An Optimal Transport approach to arbitrage correction: Application to volatility Stress-Tests | Jan 21, 2025 | Math | —Unverified | 0 |
| RedStar: Does Scaling Long-CoT Data Unlock Better Slow-Reasoning Systems? | Jan 20, 2025 | MathReinforcement Learning (RL) | —Unverified | 0 |
| Chain-of-Reasoning: Towards Unified Mathematical Reasoning in Large Language Models via a Multi-Paradigm Perspective | Jan 19, 2025 | Automated Theorem ProvingMath | —Unverified | 0 |
| Language Representation Favored Zero-Shot Cross-Domain Cognitive Diagnosis | Jan 18, 2025 | cognitive diagnosisMath | CodeCode Available | 0 |
| Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback | Jan 18, 2025 | MathMathematical Reasoning | —Unverified | 0 |
| Iterative Label Refinement Matters More than Preference Optimization under Weak Supervision | Jan 14, 2025 | Instruction FollowingMath | CodeCode Available | 0 |
| ArithmAttack: Evaluating Robustness of LLMs to Noisy Context in Math Problem Solving | Jan 14, 2025 | GSM8KMath | CodeCode Available | 0 |
| Can Vision-Language Models Evaluate Handwritten Math? | Jan 13, 2025 | Math | CodeCode Available | 0 |
| Cascaded Self-Evaluation Augmented Training for Efficient Multimodal Large Language Models | Jan 10, 2025 | Math | —Unverified | 0 |
| Stream Aligner: Efficient Sentence-Level Alignment via Distribution Induction | Jan 9, 2025 | MathSentence | CodeCode Available | 0 |
| A General Retrieval-Augmented Generation Framework for Multimodal Case-Based Reasoning Applications | Jan 9, 2025 | MathRAG | —Unverified | 0 |
| End-to-End Bangla AI for Solving Math Olympiad Problem Benchmark: Leveraging Large Language Model Using Integrated Approach | Jan 8, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| A Diversity-Enhanced Knowledge Distillation Model for Practical Math Word Problem Solving | Jan 7, 2025 | DiversityKnowledge Distillation | CodeCode Available | 0 |
| Quantization Meets Reasoning: Exploring LLM Low-Bit Quantization Degradation for Mathematical Reasoning | Jan 6, 2025 | MathMathematical Reasoning | —Unverified | 0 |
| InfiFusion: A Unified Framework for Enhanced Cross-Model Reasoning via LLM Fusion | Jan 6, 2025 | GSM8KHumanEval | —Unverified | 0 |
| Understand, Solve and Translate: Bridging the Multilingual Mathematical Reasoning Gap | Jan 5, 2025 | MathMathematical Reasoning | —Unverified | 0 |
| Empowering Bengali Education with AI: Solving Bengali Math Word Problems through Transformer Models | Jan 5, 2025 | Math | —Unverified | 0 |
| Instruction-Following Pruning for Large Language Models | Jan 3, 2025 | Instruction FollowingMath | —Unverified | 0 |
| A Probabilistic Model for Node Classification in Directed Graphs | Jan 3, 2025 | MathNode Classification | CodeCode Available | 0 |
| Recursive Decomposition of Logical Thoughts: Framework for Superior Reasoning and Knowledge Propagation in Large Language Models | Jan 3, 2025 | GSM8KMath | —Unverified | 0 |
| DIVE: Diversified Iterative Self-Improvement | Jan 1, 2025 | DiversityGSM8K | CodeCode Available | 0 |
| Experimental Demonstration of an Optical Neural PDE Solver via On-Chip PINN Training | Jan 1, 2025 | Math | —Unverified | 0 |
| Rethink Delay Doppler Channels and Time-Frequency Coding | Dec 31, 2024 | Math | —Unverified | 0 |
| Measuring Large Language Models Capacity to Annotate Journalistic Sourcing | Dec 30, 2024 | BenchmarkingEthics | —Unverified | 0 |
| Slow Perception: Let's Perceive Geometric Figures Step-by-step | Dec 30, 2024 | MathVisual Reasoning | —Unverified | 0 |
| Dynamic Skill Adaptation for Large Language Models | Dec 26, 2024 | Math | —Unverified | 0 |
| StructTest: Benchmarking LLMs' Reasoning through Compositional Structured Outputs | Dec 23, 2024 | BenchmarkingLogical Reasoning | —Unverified | 0 |
| Evaluating the Design Features of an Intelligent Tutoring System for Advanced Mathematics Learning | Dec 23, 2024 | Math | —Unverified | 0 |
| Towards Intrinsic Self-Correction Enhancement in Monte Carlo Tree Search Boosted Reasoning via Iterative Preference Learning | Dec 23, 2024 | Arithmetic ReasoningGSM8K | —Unverified | 0 |
| Ask-Before-Detection: Identifying and Mitigating Conformity Bias in LLM-Powered Error Detector for Math Word Problem Solutions | Dec 22, 2024 | GSM8KMath | —Unverified | 0 |
| System-2 Mathematical Reasoning via Enriched Instruction Tuning | Dec 22, 2024 | ERPGSM8K | —Unverified | 0 |
| Correct implied volatility shapes and reliable pricing in the rough Heston model | Dec 20, 2024 | Math | —Unverified | 0 |
| Template-Driven LLM-Paraphrased Framework for Tabular Math Word Problem Generation | Dec 20, 2024 | MathMathematical Reasoning | CodeCode Available | 0 |
| Ensembling Large Language Models with Process Reward-Guided Tree Search for Better Complex Reasoning | Dec 20, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Formal Mathematical Reasoning: A New Frontier in AI | Dec 20, 2024 | Automated Theorem ProvingMath | —Unverified | 0 |
| Critical-Questions-of-Thought: Steering LLM reasoning with Argumentative Querying | Dec 19, 2024 | MathMathematical Reasoning | CodeCode Available | 0 |
| Conceptual In-Context Learning and Chain of Concepts: Solving Complex Conceptual Problems Using Large Language Models | Dec 19, 2024 | In-Context LearningMath | —Unverified | 0 |
| AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling | Dec 19, 2024 | Math | —Unverified | 0 |
| Data for Mathematical Copilots: Better Ways of Presenting Proofs for Machine Learning | Dec 19, 2024 | Math | —Unverified | 0 |