| The Hallucination Tax of Reinforcement Finetuning | May 20, 2025 | HallucinationMath | —Unverified | 0 | 0 |
| Explaining Math Word Problem Solvers | Jul 24, 2023 | Math | —Unverified | 0 | 0 |
| Explain with Visual Keypoints Like a Real Mentor! A Benchmark for Multimodal Solution Explanation | Apr 4, 2025 | MathMathematical Reasoning | —Unverified | 0 | 0 |
| Explanation Generation for a Math Word Problem Solver | Oct 1, 2015 | Explanation GenerationMath | —Unverified | 0 | 0 |
| Explicit Knowledge Transfer for Weakly-Supervised Code Generation | Nov 30, 2022 | Code GenerationFew-Shot Learning | —Unverified | 0 | 0 |
| Exploring Educational Equity: A Machine Learning Approach to Unravel Achievement Disparities in Georgia | Jan 25, 2024 | Math | —Unverified | 0 | 0 |
| Can ChatGPT Defend its Belief in Truth? Evaluating LLM Reasoning via Debate | May 22, 2023 | BenchmarkingMath | —Unverified | 0 | 0 |
| Exploring the Hidden Reasoning Process of Large Language Models by Misleading Them | Mar 20, 2025 | MathMemorization | —Unverified | 0 | 0 |
| Exploring the Impact of Instruction Data Scaling on Large Language Models: An Empirical Study on Real-World Use Cases | Mar 26, 2023 | Math | —Unverified | 0 | 0 |
| Calculus on MDPs: Potential Shaping as a Gradient | Aug 20, 2022 | Math | —Unverified | 0 | 0 |
| Exploring the Mystery of Influential Data for Mathematical Reasoning | Apr 1, 2024 | MathMathematical Reasoning | —Unverified | 0 | 0 |
| Exposing the Achilles' Heel: Evaluating LLMs Ability to Handle Mistakes in Mathematical Reasoning | Jun 16, 2024 | BenchmarkingMath | —Unverified | 0 | 0 |
| The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity | Jun 7, 2025 | Math | —Unverified | 0 | 0 |
| Extracting the Unknown from Long Math Problems | Mar 22, 2021 | Math | —Unverified | 0 | 0 |
| Fairness Hub Technical Briefs: AUC Gap | Sep 20, 2023 | FairnessMath | —Unverified | 0 | 0 |
| Fairshare Data Pricing via Data Valuation for Large Language Models | Jan 31, 2025 | Data ValuationMath | —Unverified | 0 | 0 |
| FANS -- Formal Answer Selection for Natural Language Math Reasoning Using Lean4 | Mar 5, 2025 | Answer SelectionMath | —Unverified | 0 | 0 |
| BurTorch: Revisiting Training from First Principles by Coupling Autodiff, Math Optimization, and Systems | Mar 18, 2025 | CPUMath | —Unverified | 0 | 0 |
| Fast Diffusion Inhibits Disease Outbreaks | Jul 29, 2019 | Math | —Unverified | 0 | 0 |
| Faster and Better LLMs via Latency-Aware Test-Time Scaling | May 26, 2025 | Math | —Unverified | 0 | 0 |
| Feature Selection Based on Confidence Machine | Oct 20, 2014 | feature selectionMath | —Unverified | 0 | 0 |
| The Impact of Item-Writing Flaws on Difficulty and Discrimination in Item Response Theory | Mar 13, 2025 | MathMultiple-choice | —Unverified | 0 | 0 |
| Few-Shot Recalibration of Language Models | Mar 27, 2024 | MathMMLU | —Unverified | 0 | 0 |
| FG-PRM: Fine-grained Hallucination Detection and Mitigation in Language Model Mathematical Reasoning | Oct 8, 2024 | GSM8KHallucination | —Unverified | 0 | 0 |
| FineMath: A Fine-Grained Mathematical Evaluation Benchmark for Chinese Large Language Models | Mar 12, 2024 | MathMathematical Reasoning | —Unverified | 0 | 0 |
| The Invalsi Benchmarks: measuring Linguistic and Mathematical understanding of Large Language Models in Italian | Mar 27, 2024 | Language ModellingMath | —Unverified | 0 | 0 |
| Weakest Link in the Chain: Security Vulnerabilities in Advanced Reasoning Models | Jun 16, 2025 | Math | —Unverified | 0 | 0 |
| First-Step Advantage: Importance of Starting Right in Multi-Step Math Reasoning | Nov 14, 2023 | GSM8KMath | —Unverified | 0 | 0 |
| Fixation probabilities for the Moran process in evolutionary games with two strategies: graph shapes and large population asymptotics | Apr 30, 2018 | Math | —Unverified | 0 | 0 |
| Fixation probabilities for the Moran process with three or more strategies: general and coupling results | Nov 23, 2018 | Math | —Unverified | 0 | 0 |
| Building Math Agents with Multi-Turn Iterative Preference Learning | Sep 4, 2024 | GSM8KMath | —Unverified | 0 | 0 |
| Forewarned is Forearmed: Leveraging LLMs for Data Synthesis through Failure-Inducing Exploration | Oct 22, 2024 | Math | —Unverified | 0 | 0 |
| The Logic of Political Survival Revisited: Consequences of Elite Uncertainty Under Authoritarian Rule | Aug 4, 2024 | Math | —Unverified | 0 | 0 |
| Formal Mathematical Reasoning: A New Frontier in AI | Dec 20, 2024 | Automated Theorem ProvingMath | —Unverified | 0 | 0 |
| The Long-Term Effects of Teachers' Gender Stereotypes | Dec 16, 2022 | Math | —Unverified | 0 | 0 |
| fPLSA: Learning Semantic Structures in Document Collections Using Foundation Models | Oct 7, 2024 | Math | —Unverified | 0 | 0 |
| FRACTAL: Fine-Grained Scoring from Aggregate Text Labels | Apr 7, 2024 | MathMultiple Instance Learning | —Unverified | 0 | 0 |
| BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning | Jan 31, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| From Blind Solvers to Logical Thinkers: Benchmarking LLMs' Logical Integrity on Faulty Mathematical Problems | Oct 24, 2024 | BenchmarkingCommon Sense Reasoning | —Unverified | 0 | 0 |
| From fixation probabilities to d-player games: an inverse problem in evolutionary dynamics | Nov 20, 2018 | MathUnity | —Unverified | 0 | 0 |
| The Mathematics of Market Timing | Dec 13, 2017 | Math | —Unverified | 0 | 0 |
| From Good to Great: Improving Math Reasoning with Tool-Augmented Interleaf Prompting | Dec 18, 2023 | DiversityGSM8K | —Unverified | 0 | 0 |
| From Large to Tiny: Distilling and Refining Mathematical Expertise for Math Word Problems with Weakly Supervision | Mar 21, 2024 | Math | —Unverified | 0 | 0 |
| From Textbooks to Knowledge: A Case Study in Harvesting Axiomatic Knowledge from Textbooks to Solve Geometry Problems | Sep 1, 2017 | MathQuestion Answering | —Unverified | 0 | 0 |
| From Text to Visuals: Using LLMs to Generate Math Diagrams with Vector Graphics | Mar 10, 2025 | MathQuestion Answering | —Unverified | 0 | 0 |
| Bridging the Training-Inference Gap in LLMs by Leveraging Self-Generated Tokens | Oct 18, 2024 | MathQuestion Answering | —Unverified | 0 | 0 |
| Bridging Offline and Online Reinforcement Learning for LLMs | Jun 26, 2025 | Instruction FollowingMath | —Unverified | 0 | 0 |
| Breaking Ties: Regression Discontinuity Design Meets Market Design | Dec 31, 2020 | Mathregression | —Unverified | 0 | 0 |
| Gamifying Math Education using Object Detection | Apr 13, 2023 | MathObject | —Unverified | 0 | 0 |
| GAPS: Geometry-Aware Problem Solver | Jan 29, 2024 | Geometry Problem SolvingMath | —Unverified | 0 | 0 |