| Entropy-Guided Watermarking for LLMs: A Test-Time Framework for Robust and Traceable Text Generation | Apr 16, 2025 | GSM8KMath | —Unverified | 0 |
| Entropy Martingale Optimal Transport and Nonlinear Pricing-Hedging Duality | May 26, 2020 | Math | —Unverified | 0 |
| EoRA: Training-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation | Oct 28, 2024 | ARCMath | —Unverified | 0 |
| Error Classification of Large Language Models on Math Word Problems: A Dynamically Adaptive Framework | Jan 26, 2025 | MathMathematical Reasoning | —Unverified | 0 |
| The Effect of Teacher Gender on Student Achievement in Primary School | Oct 31, 2014 | Math | —Unverified | 0 |
| Can LLMs Reason Abstractly Over Math Word Problems Without CoT? Disentangling Abstract Formulation From Arithmetic Computation | May 29, 2025 | GSM8KMath | —Unverified | 0 |
| The Entropic Measure Transform | Feb 21, 2019 | Math | —Unverified | 0 |
| Evaluating GPT-4 at Grading Handwritten Solutions in Math Exams | Nov 7, 2024 | Math | —Unverified | 0 |
| Evaluating Grounded Reasoning by Code-Assisted Large Language Models for Mathematics | Apr 24, 2025 | Code GenerationMath | —Unverified | 0 |
| The Function Transformation Omics - Funomics | Aug 17, 2018 | Math | —Unverified | 0 |
| Evaluating Robustness of Reward Models for Mathematical Reasoning | Oct 2, 2024 | MathMathematical Reasoning | —Unverified | 0 |
| Evaluating the Design Features of an Intelligent Tutoring System for Advanced Mathematics Learning | Dec 23, 2024 | Math | —Unverified | 0 |
| EvoGPT-f: An Evolutionary GPT Framework for Benchmarking Formal Math Languages | Feb 12, 2024 | Automated Theorem ProvingBenchmarking | —Unverified | 0 |
| Can I understand what I create? Self-Knowledge Evaluation of Large Language Models | Jun 10, 2024 | Math | —Unverified | 0 |
| Evolving LLMs' Self-Refinement Capability via Iterative Preference Optimization | Feb 8, 2025 | GSM8KMath | —Unverified | 0 |
| The Gap of Semantic Parsing: A Survey on Automatic Math Word Problem Solvers | Aug 22, 2018 | MathSemantic Parsing | —Unverified | 0 |
| Examining the Behavior of LLM Architectures Within the Framework of Standardized National Exams in Brazil | Aug 9, 2024 | MathMultiple-choice | —Unverified | 0 |
| Examining the Robustness of Large Language Models across Language Complexity | Jan 30, 2025 | Math | —Unverified | 0 |
| Wavelet GPT: Wavelet Inspired Large Language Models | Sep 4, 2024 | DecoderMath | —Unverified | 0 |
| Exploring Educational Equity: A Machine Learning Approach to Unravel Achievement Disparities in Georgia | Jan 25, 2024 | Math | —Unverified | 0 |
| Can ChatGPT Defend its Belief in Truth? Evaluating LLM Reasoning via Debate | May 22, 2023 | BenchmarkingMath | —Unverified | 0 |
| Exploring the Hidden Reasoning Process of Large Language Models by Misleading Them | Mar 20, 2025 | MathMemorization | —Unverified | 0 |
| Exploring the Impact of Instruction Data Scaling on Large Language Models: An Empirical Study on Real-World Use Cases | Mar 26, 2023 | Math | —Unverified | 0 |
| Calculus on MDPs: Potential Shaping as a Gradient | Aug 20, 2022 | Math | —Unverified | 0 |
| Exploring the Mystery of Influential Data for Mathematical Reasoning | Apr 1, 2024 | MathMathematical Reasoning | —Unverified | 0 |
| Exposing the Achilles' Heel: Evaluating LLMs Ability to Handle Mistakes in Mathematical Reasoning | Jun 16, 2024 | BenchmarkingMath | —Unverified | 0 |
| The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity | Jun 7, 2025 | Math | —Unverified | 0 |
| Extracting the Unknown from Long Math Problems | Mar 22, 2021 | Math | —Unverified | 0 |
| Fairness Hub Technical Briefs: AUC Gap | Sep 20, 2023 | FairnessMath | —Unverified | 0 |
| Fairshare Data Pricing via Data Valuation for Large Language Models | Jan 31, 2025 | Data ValuationMath | —Unverified | 0 |
| FANS -- Formal Answer Selection for Natural Language Math Reasoning Using Lean4 | Mar 5, 2025 | Answer SelectionMath | —Unverified | 0 |
| BurTorch: Revisiting Training from First Principles by Coupling Autodiff, Math Optimization, and Systems | Mar 18, 2025 | CPUMath | —Unverified | 0 |
| Fast Diffusion Inhibits Disease Outbreaks | Jul 29, 2019 | Math | —Unverified | 0 |
| Faster and Better LLMs via Latency-Aware Test-Time Scaling | May 26, 2025 | Math | —Unverified | 0 |
| Feature Selection Based on Confidence Machine | Oct 20, 2014 | feature selectionMath | —Unverified | 0 |
| The Impact of Item-Writing Flaws on Difficulty and Discrimination in Item Response Theory | Mar 13, 2025 | MathMultiple-choice | —Unverified | 0 |
| Few-Shot Recalibration of Language Models | Mar 27, 2024 | MathMMLU | —Unverified | 0 |
| FG-PRM: Fine-grained Hallucination Detection and Mitigation in Language Model Mathematical Reasoning | Oct 8, 2024 | GSM8KHallucination | —Unverified | 0 |
| FineMath: A Fine-Grained Mathematical Evaluation Benchmark for Chinese Large Language Models | Mar 12, 2024 | MathMathematical Reasoning | —Unverified | 0 |
| The Invalsi Benchmarks: measuring Linguistic and Mathematical understanding of Large Language Models in Italian | Mar 27, 2024 | Language ModellingMath | —Unverified | 0 |
| Weakest Link in the Chain: Security Vulnerabilities in Advanced Reasoning Models | Jun 16, 2025 | Math | —Unverified | 0 |
| First-Step Advantage: Importance of Starting Right in Multi-Step Math Reasoning | Nov 14, 2023 | GSM8KMath | —Unverified | 0 |
| Fixation probabilities for the Moran process in evolutionary games with two strategies: graph shapes and large population asymptotics | Apr 30, 2018 | Math | —Unverified | 0 |
| Fixation probabilities for the Moran process with three or more strategies: general and coupling results | Nov 23, 2018 | Math | —Unverified | 0 |
| Building Math Agents with Multi-Turn Iterative Preference Learning | Sep 4, 2024 | GSM8KMath | —Unverified | 0 |
| Forewarned is Forearmed: Leveraging LLMs for Data Synthesis through Failure-Inducing Exploration | Oct 22, 2024 | Math | —Unverified | 0 |
| The Logic of Political Survival Revisited: Consequences of Elite Uncertainty Under Authoritarian Rule | Aug 4, 2024 | Math | —Unverified | 0 |
| Formal Mathematical Reasoning: A New Frontier in AI | Dec 20, 2024 | Automated Theorem ProvingMath | —Unverified | 0 |
| The Long-Term Effects of Teachers' Gender Stereotypes | Dec 16, 2022 | Math | —Unverified | 0 |
| fPLSA: Learning Semantic Structures in Document Collections Using Foundation Models | Oct 7, 2024 | Math | —Unverified | 0 |