| Exposing the Achilles' Heel: Evaluating LLMs Ability to Handle Mistakes in Mathematical Reasoning | Jun 16, 2024 | BenchmarkingMath | —Unverified | 0 |
| The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity | Jun 7, 2025 | Math | —Unverified | 0 |
| Extracting the Unknown from Long Math Problems | Mar 22, 2021 | Math | —Unverified | 0 |
| Fairness Hub Technical Briefs: AUC Gap | Sep 20, 2023 | FairnessMath | —Unverified | 0 |
| Fairshare Data Pricing via Data Valuation for Large Language Models | Jan 31, 2025 | Data ValuationMath | —Unverified | 0 |
| FANS -- Formal Answer Selection for Natural Language Math Reasoning Using Lean4 | Mar 5, 2025 | Answer SelectionMath | —Unverified | 0 |
| BurTorch: Revisiting Training from First Principles by Coupling Autodiff, Math Optimization, and Systems | Mar 18, 2025 | CPUMath | —Unverified | 0 |
| Fast Diffusion Inhibits Disease Outbreaks | Jul 29, 2019 | Math | —Unverified | 0 |
| Faster and Better LLMs via Latency-Aware Test-Time Scaling | May 26, 2025 | Math | —Unverified | 0 |
| Feature Selection Based on Confidence Machine | Oct 20, 2014 | feature selectionMath | —Unverified | 0 |
| The Impact of Item-Writing Flaws on Difficulty and Discrimination in Item Response Theory | Mar 13, 2025 | MathMultiple-choice | —Unverified | 0 |
| Few-Shot Recalibration of Language Models | Mar 27, 2024 | MathMMLU | —Unverified | 0 |
| FG-PRM: Fine-grained Hallucination Detection and Mitigation in Language Model Mathematical Reasoning | Oct 8, 2024 | GSM8KHallucination | —Unverified | 0 |
| FineMath: A Fine-Grained Mathematical Evaluation Benchmark for Chinese Large Language Models | Mar 12, 2024 | MathMathematical Reasoning | —Unverified | 0 |
| The Invalsi Benchmarks: measuring Linguistic and Mathematical understanding of Large Language Models in Italian | Mar 27, 2024 | Language ModellingMath | —Unverified | 0 |
| Weakest Link in the Chain: Security Vulnerabilities in Advanced Reasoning Models | Jun 16, 2025 | Math | —Unverified | 0 |
| First-Step Advantage: Importance of Starting Right in Multi-Step Math Reasoning | Nov 14, 2023 | GSM8KMath | —Unverified | 0 |
| Fixation probabilities for the Moran process in evolutionary games with two strategies: graph shapes and large population asymptotics | Apr 30, 2018 | Math | —Unverified | 0 |
| Fixation probabilities for the Moran process with three or more strategies: general and coupling results | Nov 23, 2018 | Math | —Unverified | 0 |
| Building Math Agents with Multi-Turn Iterative Preference Learning | Sep 4, 2024 | GSM8KMath | —Unverified | 0 |
| Forewarned is Forearmed: Leveraging LLMs for Data Synthesis through Failure-Inducing Exploration | Oct 22, 2024 | Math | —Unverified | 0 |
| The Logic of Political Survival Revisited: Consequences of Elite Uncertainty Under Authoritarian Rule | Aug 4, 2024 | Math | —Unverified | 0 |
| Formal Mathematical Reasoning: A New Frontier in AI | Dec 20, 2024 | Automated Theorem ProvingMath | —Unverified | 0 |
| The Long-Term Effects of Teachers' Gender Stereotypes | Dec 16, 2022 | Math | —Unverified | 0 |
| fPLSA: Learning Semantic Structures in Document Collections Using Foundation Models | Oct 7, 2024 | Math | —Unverified | 0 |