Mathematical Reasoning

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 551–600 of 805 papers

Title	Date	Tasks	Status
Guided Search Strategies in Non-Serializable Environments with Applications to Software Engineering Agents	May 19, 2025	Mathematical Reasoning	—Unverified
Herald: A Natural Language Annotated Lean 4 Dataset	Oct 9, 2024	MathMathematical Reasoning	—Unverified
HM3: Hierarchical Multi-Objective Model Merging for Pretrained Models	Sep 27, 2024	Code GenerationMathematical Reasoning	—Unverified
HOFT: Householder Orthogonal Fine-tuning	May 22, 2025	Machine TranslationMathematical Reasoning	—Unverified
How Difficulty-Aware Staged Reinforcement Learning Enhances LLMs' Reasoning Capabilities: A Preliminary Experimental Study	Apr 1, 2025	Code GenerationMath	—Unverified
How Does Quantization Affect Multilingual LLMs?	Jul 3, 2024	Mathematical ReasoningQuantization	—Unverified
How Numerical Precision Affects Mathematical Reasoning Capabilities of LLMs	Oct 17, 2024	Mathematical Reasoning	—Unverified
HS-STAR: Hierarchical Sampling for Self-Taught Reasoners via Difficulty Estimation and Budget Reallocation	May 26, 2025	Mathematical Reasoning	—Unverified
Improve Mathematical Reasoning in Language Models by Automated Process Supervision	Jun 5, 2024	GSM8KMath	—Unverified
Improving Mathematical Reasoning Capabilities of Small Language Models via Feedback-Driven Distillation	Nov 22, 2024	Knowledge DistillationMathematical Reasoning	—Unverified
Improving Multilingual Math Reasoning for African Languages	May 26, 2025	MathMathematical Reasoning	—Unverified
Improving Physics Reasoning in Large Language Models Using Mixture of Refinement Agents	Dec 1, 2024	Mathematical ReasoningMMLU	—Unverified
Improving RL Exploration for LLM Reasoning through Retrospective Replay	Apr 19, 2025	Code GenerationMathematical Reasoning	—Unverified
Improving Rule-based Reasoning in LLMs via Neurosymbolic Representations	Jan 31, 2025	Mathematical Reasoning	—Unverified
Distilling Mathematical Reasoning Capabilities into Small Language Models	Jan 22, 2024	Mathematical Reasoning	—Unverified
Improving Small-Scale Large Language Models Function Calling for Reasoning Tasks	Oct 24, 2024	Logical ReasoningMathematical Problem-Solving	—Unverified
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning	Sep 19, 2024	MathMathematical Reasoning	—Unverified
Innate Reasoning is Not Enough: In-Context Learning Enhances Reasoning Large Language Models with Less Overthinking	Mar 25, 2025	In-Context LearningMathematical Reasoning	—Unverified
Inside you are many wolves: Using cognitive models to interpret value trade-offs in LLMs	Jun 25, 2025	Mathematical Reasoning	—Unverified
Integrating Arithmetic Learning Improves Mathematical Reasoning in Smaller Models	Feb 18, 2025	Data AugmentationGSM8K	—Unverified
Integrating External Tools with Large Language Models to Improve Accuracy	Jul 9, 2025	Mathematical ReasoningMMLU	—Unverified
InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model	Jan 21, 2025	Instruction FollowingMathematical Reasoning	—Unverified
Investigating the Effectiveness of ChatGPT in Mathematical Reasoning and Problem Solving: Evidence from the Vietnamese National High School Graduation Examination	Jun 10, 2023	MathMathematical Reasoning	—Unverified
Investigating the interaction of linguistic and mathematical reasoning in language models using multilingual number puzzles	Jun 16, 2025	DiversityMathematical Reasoning	—Unverified
Investigating the Potential of Large Language Model-Based Router Multi-Agent Architectures for Foundation Design Automation: A Task Classification and Expert Selection Study	Jun 13, 2025	Language ModelingLanguage Modelling	—Unverified
IrokoBench: A New Benchmark for African Languages in the Age of Large Language Models	Jun 5, 2024	Mathematical ReasoningNatural Language Inference	—Unverified
Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist	Jul 11, 2024	GSM8KMath	—Unverified
iTBLS: A Dataset of Interactive Conversations Over Tabular Information	Apr 19, 2024	ArticlesMathematical Reasoning	—Unverified
JiuZhang 2.0: A Unified Chinese Pre-trained Language Model for Multi-task Mathematical Problem Solving	Jun 19, 2023	In-Context LearningLanguage Modeling	—Unverified
Keep Guessing? When Considering Inference Scaling, Mind the Baselines	Oct 20, 2024	Mathematical Reasoning	—Unverified
Key-Point-Driven Data Synthesis with its Enhancement on Mathematical Reasoning	Mar 4, 2024	GSM8KMath	—Unverified
Key-Point-Driven Mathematical Reasoning Distillation of Large Language Model	Jul 14, 2024	Language ModelingLanguage Modelling	—Unverified
KisMATH: Do LLMs Have Knowledge of Implicit Structures in Mathematical Reasoning?	Jul 15, 2025	GSM8KLanguage Modeling	—Unverified
Knowledge Augmented Complex Problem Solving with Large Language Models: A Survey	May 6, 2025	Mathematical Reasoning	—Unverified
Knowledge Distillation of LLM for Automatic Scoring of Science Education Assessments	Dec 26, 2023	Knowledge DistillationMathematical Reasoning	—Unverified
Kwai-STaR: Transform LLMs into State-Transition Reasoners	Nov 7, 2024	GSM8KMathematical Problem-Solving	—Unverified
KwaiYiiMath: Technical Report	Oct 11, 2023	Arithmetic ReasoningGSM8K	—Unverified
Mathematical Reasoning via Self-supervised Skip-tree Training	Jun 8, 2020	Language ModelingLanguage Modelling	—Unverified
Language Models Use Trigonometry to Do Addition	Feb 2, 2025	Language ModelingLanguage Modelling	—Unverified
LANS: A Layout-Aware Neural Solver for Plane Geometry Problem	Nov 25, 2023	Geometry Problem SolvingLanguage Modelling	—Unverified
Large Language Models and Mathematical Reasoning Failures	Feb 17, 2025	Mathematical ReasoningPhysical Intuition	—Unverified
Large Language Models Don't Make Sense of Word Problems. A Scoping Review from a Mathematics Education Perspective	Jun 30, 2025	Mathematical Reasoning	—Unverified
Large Language Models for Combinatorial Optimization of Design Structure Matrix	Nov 19, 2024	Combinatorial OptimizationMathematical Reasoning	—Unverified
Large Language Models for Design Structure Matrix Optimization	Jun 11, 2025	Combinatorial OptimizationMathematical Reasoning	—Unverified
Large Language Models for Mathematical Reasoning: Progresses and Challenges	Jan 31, 2024	DiversityMath	—Unverified
Large Language Models Have Intrinsic Meta-Cognition, but Need a Good Lens	Jun 10, 2025	BenchmarkingMathematical Reasoning	—Unverified
Large Multi-Modal Models (LMMs) as Universal Foundation Models for AI-Native Wireless Systems	Jan 30, 2024	Mathematical ReasoningRAG	—Unverified
Layer Importance for Mathematical Reasoning is Forged in Pre-Training and Invariant after Post-Training	Jun 27, 2025	Knowledge DistillationMathematical Reasoning	—Unverified
Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models	Oct 2, 2024	Cross-Lingual TransferMath	—Unverified
LeanProgress: Guiding Search for Neural Theorem Proving via Proof Progress Prediction	Feb 25, 2025	Automated Theorem ProvingMathematical Reasoning	—Unverified

Show:10 25 50

← PrevPage 12 of 17Next →

All datasets AIME24 FrontierMath Lila (IID)Lila (OOD)PGPS9K AMC23 GeoQA Math500 UniGeo UniGeo (PRV)

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Xolver	Acc	94.4	—	Unverified
2	DeepSeek-r1	Acc	79.8	—	Unverified
3	Openai-o1	Acc	74.4	—	Unverified
4	Openai-o1-mini	Acc	70	—	Unverified
5	Search-o1	Acc	56.7	—	Unverified
6	s1-32B	Acc	56.7	—	Unverified
7	Openai-o1-preview	Acc	44.6	—	Unverified
8	Qwen2.5-72B-Instruct	Acc	23.3	—	Unverified
9	Claude3.5-Sonnet	Acc	16	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	o3	Accuracy	0.25	—	Unverified
2	Gemini 1.5 Pro (002)	Accuracy	0.02	—	Unverified
3	GPT-4o	Accuracy	0.01	—	Unverified
4	o1-mini	Accuracy	0.01	—	Unverified
5	o1-preview	Accuracy	0.01	—	Unverified
6	Claude 3.5 Sonnet	Accuracy	0.01	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Codex (Few-Shot, 175B)	Accuracy	0.6	—	Unverified
2	Bhāskara-P (Fine-tuned, 2.7B)	Accuracy	0.48	—	Unverified
3	Neo-P (Fine-tuned, 2.7B)	Accuracy	0.39	—	Unverified
4	GPT-3 (Few-Shot, 175B)	Accuracy	0.38	—	Unverified
5	Bhāskara-A (Fine-tuned, 2.7B)	Accuracy	0.25	—	Unverified
6	Neo-A (Fine-tuned, 2.7B)	Accuracy	0.2	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Codex (Few-Shot, 175B)	Accuracy	0.59	—	Unverified
2	Bhāskara-P (Fine-tuned, 2.7B)	Accuracy	0.45	—	Unverified
3	GPT-3 (Few-Shot, 175B)	Accuracy	0.38	—	Unverified
4	Bhāskara-A (Fine-tuned, 2.7B)	Accuracy	0.27	—	Unverified
5	Neo-P (Fine-tuned, 2.7B)	Accuracy	0.24	—	Unverified
6	Neo-A (Fine-tuned, 2.7B)	Accuracy	0.18	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	GOLD	Completion accuracy	65.8	—	Unverified
2	PGPSNet	Completion accuracy	62.7	—	Unverified
3	GAPS	Completion accuracy	61.2	—	Unverified
4	Inter-GPS	Completion accuracy	59.8	—	Unverified
5	Geoformer	Completion accuracy	35.6	—	Unverified
6	NGS	Completion accuracy	34.1	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	QWQ-32B-preview	Acc	82.5	—	Unverified
2	Math-Master	Acc	82	—	Unverified
3	Qwen2.5-Math-7B-instruct	Acc	62.5	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	GOLD	Accuracy (%)	75.2	—	Unverified
2	GAPS	Accuracy (%)	67.8	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Search-o1	Acc	86.4	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	GOLD	Accuracy (%)	98.5	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	GAPS	Accuracy (%)	97.5	—	Unverified