Mathematical Reasoning

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 726–750 of 805 papers

Title	Date	Tasks	Status
EquivPruner: Boosting Efficiency and Quality in LLM-Based Search via Action Pruning	May 22, 2025	GSM8KMath	CodeCode Available
Techniques to Improve Neural Math Word Problem Solvers	Feb 6, 2023	DecoderLanguage Modelling	CodeCode Available
CER: Confidence Enhanced Reasoning in LLMs	Feb 20, 2025	MathMathematical Reasoning	CodeCode Available
Compositional Generalization with Tree Stack Memory Units	Nov 5, 2019	Mathematical ReasoningZero-shot Generalization	CodeCode Available
Rethinking Fine-Tuning when Scaling Test-Time Compute: Limiting Confidence Improves Mathematical Reasoning	Feb 11, 2025	Code GenerationMath	CodeCode Available
Template-Driven LLM-Paraphrased Framework for Tabular Math Word Problem Generation	Dec 20, 2024	MathMathematical Reasoning	CodeCode Available
Temporal Consistency for LLM Reasoning Process Error Identification	Mar 18, 2025	Mathematical Reasoning	CodeCode Available
MC-NEST -- Enhancing Mathematical Reasoning in Large Language Models with a Monte Carlo Nash Equilibrium Self-Refine Tree	Nov 23, 2024	Decision MakingMathematical Reasoning	CodeCode Available
Reverse Operation based Data Augmentation for Solving Math Word Problems	Oct 4, 2020	Data AugmentationMath	CodeCode Available
TRIGO: Benchmarking Formal Mathematical Proof Reduction for Generative Language Models	Oct 16, 2023	Automated Theorem ProvingBenchmarking	CodeCode Available
A Survey of Deep Learning for Geometry Problem Solving	Jul 16, 2025	Deep LearningGeometry Problem Solving	CodeCode Available
Can LLMs Solve longer Math Word Problems Better?	May 23, 2024	Data AugmentationMath	CodeCode Available
TUBench: Benchmarking Large Vision-Language Models on Trustworthiness with Unanswerable Questions	Oct 5, 2024	BenchmarkingHallucination	CodeCode Available
Enumerate-Conjecture-Prove: Formally Solving Answer-Construction Problems in Math Competitions	May 24, 2025	Automated Theorem ProvingMath	CodeCode Available
Can LLMs Master Math? Investigating Large Language Models on Math Stack Exchange	Mar 30, 2024	MathMathematical Problem-Solving	CodeCode Available
MCC-KD: Multi-CoT Consistent Knowledge Distillation	Oct 23, 2023	DiversityKnowledge Distillation	CodeCode Available
Math Word Problem Solving by Generating Linguistic Variants of Problem Statements	Jun 24, 2023	DecoderIngenuity	CodeCode Available
MATHSENSEI: A Tool-Augmented Large Language Model for Mathematical Reasoning	Feb 27, 2024	8kLanguage Modeling	CodeCode Available
Can A Gamer Train A Mathematical Reasoning Model?	Jun 10, 2025	GPUMathematical Reasoning	CodeCode Available
MathScape: Evaluating MLLMs in multimodal Math Scenarios through a Hierarchical Benchmark	Aug 14, 2024	MathMathematical Reasoning	CodeCode Available
RMoA: Optimizing Mixture-of-Agents through Diversity Maximization and Residual Compensation	May 30, 2025	Code GenerationDiversity	CodeCode Available
Position: AI Evaluation Should Learn from How We Test Humans	Jun 18, 2023	Mathematical ReasoningPosition	CodeCode Available
RoMath: A Mathematical Reasoning Benchmark in Romanian	Sep 17, 2024	Mathematical Reasoning	CodeCode Available
MathScale: Scaling Instruction Tuning for Mathematical Reasoning	Mar 5, 2024	GSM8KMath	CodeCode Available
Mathematical Reasoning in Large Language Models: Assessing Logical and Arithmetic Errors across Wide Numerical Ranges	Feb 12, 2025	GSM8KMath	CodeCode Available

Show:10 25 50

← PrevPage 30 of 33Next →

All datasets AIME24 FrontierMath Lila (IID)Lila (OOD)PGPS9K AMC23 GeoQA Math500 UniGeo UniGeo (PRV)

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Xolver	Acc	94.4	—	Unverified
2	DeepSeek-r1	Acc	79.8	—	Unverified
3	Openai-o1	Acc	74.4	—	Unverified
4	Openai-o1-mini	Acc	70	—	Unverified
5	Search-o1	Acc	56.7	—	Unverified
6	s1-32B	Acc	56.7	—	Unverified
7	Openai-o1-preview	Acc	44.6	—	Unverified
8	Qwen2.5-72B-Instruct	Acc	23.3	—	Unverified
9	Claude3.5-Sonnet	Acc	16	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	o3	Accuracy	0.25	—	Unverified
2	Gemini 1.5 Pro (002)	Accuracy	0.02	—	Unverified
3	GPT-4o	Accuracy	0.01	—	Unverified
4	o1-mini	Accuracy	0.01	—	Unverified
5	o1-preview	Accuracy	0.01	—	Unverified
6	Claude 3.5 Sonnet	Accuracy	0.01	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Codex (Few-Shot, 175B)	Accuracy	0.6	—	Unverified
2	Bhāskara-P (Fine-tuned, 2.7B)	Accuracy	0.48	—	Unverified
3	Neo-P (Fine-tuned, 2.7B)	Accuracy	0.39	—	Unverified
4	GPT-3 (Few-Shot, 175B)	Accuracy	0.38	—	Unverified
5	Bhāskara-A (Fine-tuned, 2.7B)	Accuracy	0.25	—	Unverified
6	Neo-A (Fine-tuned, 2.7B)	Accuracy	0.2	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Codex (Few-Shot, 175B)	Accuracy	0.59	—	Unverified
2	Bhāskara-P (Fine-tuned, 2.7B)	Accuracy	0.45	—	Unverified
3	GPT-3 (Few-Shot, 175B)	Accuracy	0.38	—	Unverified
4	Bhāskara-A (Fine-tuned, 2.7B)	Accuracy	0.27	—	Unverified
5	Neo-P (Fine-tuned, 2.7B)	Accuracy	0.24	—	Unverified
6	Neo-A (Fine-tuned, 2.7B)	Accuracy	0.18	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	GOLD	Completion accuracy	65.8	—	Unverified
2	PGPSNet	Completion accuracy	62.7	—	Unverified
3	GAPS	Completion accuracy	61.2	—	Unverified
4	Inter-GPS	Completion accuracy	59.8	—	Unverified
5	Geoformer	Completion accuracy	35.6	—	Unverified
6	NGS	Completion accuracy	34.1	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	QWQ-32B-preview	Acc	82.5	—	Unverified
2	Math-Master	Acc	82	—	Unverified
3	Qwen2.5-Math-7B-instruct	Acc	62.5	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	GOLD	Accuracy (%)	75.2	—	Unverified
2	GAPS	Accuracy (%)	67.8	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Search-o1	Acc	86.4	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	GOLD	Accuracy (%)	98.5	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	GAPS	Accuracy (%)	97.5	—	Unverified