Math

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 751–800 of 1596 papers

Title	Date	Tasks	Status
Entropy-Guided Watermarking for LLMs: A Test-Time Framework for Robust and Traceable Text Generation	Apr 16, 2025	GSM8KMath	—Unverified
Entropy Martingale Optimal Transport and Nonlinear Pricing-Hedging Duality	May 26, 2020	Math	—Unverified
EoRA: Training-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation	Oct 28, 2024	ARCMath	—Unverified
Error Classification of Large Language Models on Math Word Problems: A Dynamically Adaptive Framework	Jan 26, 2025	MathMathematical Reasoning	—Unverified
The Effect of Teacher Gender on Student Achievement in Primary School	Oct 31, 2014	Math	—Unverified
Can LLMs Reason Abstractly Over Math Word Problems Without CoT? Disentangling Abstract Formulation From Arithmetic Computation	May 29, 2025	GSM8KMath	—Unverified
The Entropic Measure Transform	Feb 21, 2019	Math	—Unverified
Evaluating GPT-4 at Grading Handwritten Solutions in Math Exams	Nov 7, 2024	Math	—Unverified
Evaluating Grounded Reasoning by Code-Assisted Large Language Models for Mathematics	Apr 24, 2025	Code GenerationMath	—Unverified
The Function Transformation Omics - Funomics	Aug 17, 2018	Math	—Unverified
Evaluating Robustness of Reward Models for Mathematical Reasoning	Oct 2, 2024	MathMathematical Reasoning	—Unverified
Evaluating the Design Features of an Intelligent Tutoring System for Advanced Mathematics Learning	Dec 23, 2024	Math	—Unverified
EvoGPT-f: An Evolutionary GPT Framework for Benchmarking Formal Math Languages	Feb 12, 2024	Automated Theorem ProvingBenchmarking	—Unverified
Can I understand what I create? Self-Knowledge Evaluation of Large Language Models	Jun 10, 2024	Math	—Unverified
Evolving LLMs' Self-Refinement Capability via Iterative Preference Optimization	Feb 8, 2025	GSM8KMath	—Unverified
The Gap of Semantic Parsing: A Survey on Automatic Math Word Problem Solvers	Aug 22, 2018	MathSemantic Parsing	—Unverified
Examining the Behavior of LLM Architectures Within the Framework of Standardized National Exams in Brazil	Aug 9, 2024	MathMultiple-choice	—Unverified
Examining the Robustness of Large Language Models across Language Complexity	Jan 30, 2025	Math	—Unverified
Wavelet GPT: Wavelet Inspired Large Language Models	Sep 4, 2024	DecoderMath	—Unverified
Exploring Educational Equity: A Machine Learning Approach to Unravel Achievement Disparities in Georgia	Jan 25, 2024	Math	—Unverified
Can ChatGPT Defend its Belief in Truth? Evaluating LLM Reasoning via Debate	May 22, 2023	BenchmarkingMath	—Unverified
Exploring the Hidden Reasoning Process of Large Language Models by Misleading Them	Mar 20, 2025	MathMemorization	—Unverified
Exploring the Impact of Instruction Data Scaling on Large Language Models: An Empirical Study on Real-World Use Cases	Mar 26, 2023	Math	—Unverified
Calculus on MDPs: Potential Shaping as a Gradient	Aug 20, 2022	Math	—Unverified
Exploring the Mystery of Influential Data for Mathematical Reasoning	Apr 1, 2024	MathMathematical Reasoning	—Unverified
Exposing the Achilles' Heel: Evaluating LLMs Ability to Handle Mistakes in Mathematical Reasoning	Jun 16, 2024	BenchmarkingMath	—Unverified
The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity	Jun 7, 2025	Math	—Unverified
Extracting the Unknown from Long Math Problems	Mar 22, 2021	Math	—Unverified
Fairness Hub Technical Briefs: AUC Gap	Sep 20, 2023	FairnessMath	—Unverified
Fairshare Data Pricing via Data Valuation for Large Language Models	Jan 31, 2025	Data ValuationMath	—Unverified
FANS -- Formal Answer Selection for Natural Language Math Reasoning Using Lean4	Mar 5, 2025	Answer SelectionMath	—Unverified
BurTorch: Revisiting Training from First Principles by Coupling Autodiff, Math Optimization, and Systems	Mar 18, 2025	CPUMath	—Unverified
Fast Diffusion Inhibits Disease Outbreaks	Jul 29, 2019	Math	—Unverified
Faster and Better LLMs via Latency-Aware Test-Time Scaling	May 26, 2025	Math	—Unverified
Feature Selection Based on Confidence Machine	Oct 20, 2014	feature selectionMath	—Unverified
The Impact of Item-Writing Flaws on Difficulty and Discrimination in Item Response Theory	Mar 13, 2025	MathMultiple-choice	—Unverified
Few-Shot Recalibration of Language Models	Mar 27, 2024	MathMMLU	—Unverified
FG-PRM: Fine-grained Hallucination Detection and Mitigation in Language Model Mathematical Reasoning	Oct 8, 2024	GSM8KHallucination	—Unverified
FineMath: A Fine-Grained Mathematical Evaluation Benchmark for Chinese Large Language Models	Mar 12, 2024	MathMathematical Reasoning	—Unverified
The Invalsi Benchmarks: measuring Linguistic and Mathematical understanding of Large Language Models in Italian	Mar 27, 2024	Language ModellingMath	—Unverified
Weakest Link in the Chain: Security Vulnerabilities in Advanced Reasoning Models	Jun 16, 2025	Math	—Unverified
First-Step Advantage: Importance of Starting Right in Multi-Step Math Reasoning	Nov 14, 2023	GSM8KMath	—Unverified
Fixation probabilities for the Moran process in evolutionary games with two strategies: graph shapes and large population asymptotics	Apr 30, 2018	Math	—Unverified
Fixation probabilities for the Moran process with three or more strategies: general and coupling results	Nov 23, 2018	Math	—Unverified
Building Math Agents with Multi-Turn Iterative Preference Learning	Sep 4, 2024	GSM8KMath	—Unverified
Forewarned is Forearmed: Leveraging LLMs for Data Synthesis through Failure-Inducing Exploration	Oct 22, 2024	Math	—Unverified
The Logic of Political Survival Revisited: Consequences of Elite Uncertainty Under Authoritarian Rule	Aug 4, 2024	Math	—Unverified
Formal Mathematical Reasoning: A New Frontier in AI	Dec 20, 2024	Automated Theorem ProvingMath	—Unverified
The Long-Term Effects of Teachers' Gender Stereotypes	Dec 16, 2022	Math	—Unverified
fPLSA: Learning Semantic Structures in Document Collections Using Foundation Models	Oct 7, 2024	Math	—Unverified

Show:10 25 50

← PrevPage 16 of 32Next →

No leaderboard results yet.