| The Hallucination Tax of Reinforcement Finetuning | May 20, 2025 | HallucinationMath | —Unverified | 0 | 0 |
| Explaining Math Word Problem Solvers | Jul 24, 2023 | Math | —Unverified | 0 | 0 |
| Explain with Visual Keypoints Like a Real Mentor! A Benchmark for Multimodal Solution Explanation | Apr 4, 2025 | MathMathematical Reasoning | —Unverified | 0 | 0 |
| Explanation Generation for a Math Word Problem Solver | Oct 1, 2015 | Explanation GenerationMath | —Unverified | 0 | 0 |
| Explicit Knowledge Transfer for Weakly-Supervised Code Generation | Nov 30, 2022 | Code GenerationFew-Shot Learning | —Unverified | 0 | 0 |
| Exploring Educational Equity: A Machine Learning Approach to Unravel Achievement Disparities in Georgia | Jan 25, 2024 | Math | —Unverified | 0 | 0 |
| Can ChatGPT Defend its Belief in Truth? Evaluating LLM Reasoning via Debate | May 22, 2023 | BenchmarkingMath | —Unverified | 0 | 0 |
| Exploring the Hidden Reasoning Process of Large Language Models by Misleading Them | Mar 20, 2025 | MathMemorization | —Unverified | 0 | 0 |
| Exploring the Impact of Instruction Data Scaling on Large Language Models: An Empirical Study on Real-World Use Cases | Mar 26, 2023 | Math | —Unverified | 0 | 0 |
| Calculus on MDPs: Potential Shaping as a Gradient | Aug 20, 2022 | Math | —Unverified | 0 | 0 |
| Exploring the Mystery of Influential Data for Mathematical Reasoning | Apr 1, 2024 | MathMathematical Reasoning | —Unverified | 0 | 0 |
| Exposing the Achilles' Heel: Evaluating LLMs Ability to Handle Mistakes in Mathematical Reasoning | Jun 16, 2024 | BenchmarkingMath | —Unverified | 0 | 0 |
| The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity | Jun 7, 2025 | Math | —Unverified | 0 | 0 |
| Extracting the Unknown from Long Math Problems | Mar 22, 2021 | Math | —Unverified | 0 | 0 |
| Fairness Hub Technical Briefs: AUC Gap | Sep 20, 2023 | FairnessMath | —Unverified | 0 | 0 |
| Fairshare Data Pricing via Data Valuation for Large Language Models | Jan 31, 2025 | Data ValuationMath | —Unverified | 0 | 0 |
| FANS -- Formal Answer Selection for Natural Language Math Reasoning Using Lean4 | Mar 5, 2025 | Answer SelectionMath | —Unverified | 0 | 0 |
| BurTorch: Revisiting Training from First Principles by Coupling Autodiff, Math Optimization, and Systems | Mar 18, 2025 | CPUMath | —Unverified | 0 | 0 |
| Fast Diffusion Inhibits Disease Outbreaks | Jul 29, 2019 | Math | —Unverified | 0 | 0 |
| Faster and Better LLMs via Latency-Aware Test-Time Scaling | May 26, 2025 | Math | —Unverified | 0 | 0 |
| Feature Selection Based on Confidence Machine | Oct 20, 2014 | feature selectionMath | —Unverified | 0 | 0 |
| The Impact of Item-Writing Flaws on Difficulty and Discrimination in Item Response Theory | Mar 13, 2025 | MathMultiple-choice | —Unverified | 0 | 0 |
| Few-Shot Recalibration of Language Models | Mar 27, 2024 | MathMMLU | —Unverified | 0 | 0 |
| FG-PRM: Fine-grained Hallucination Detection and Mitigation in Language Model Mathematical Reasoning | Oct 8, 2024 | GSM8KHallucination | —Unverified | 0 | 0 |
| FineMath: A Fine-Grained Mathematical Evaluation Benchmark for Chinese Large Language Models | Mar 12, 2024 | MathMathematical Reasoning | —Unverified | 0 | 0 |