SOTAVerified

Mathematical Reasoning

Papers

Showing 451500 of 805 papers

TitleStatusHype
Improving Multilingual Math Reasoning for African Languages0
Improving Physics Reasoning in Large Language Models Using Mixture of Refinement Agents0
Improving RL Exploration for LLM Reasoning through Retrospective Replay0
Improving Rule-based Reasoning in LLMs via Neurosymbolic Representations0
Distilling Mathematical Reasoning Capabilities into Small Language Models0
Improving Small-Scale Large Language Models Function Calling for Reasoning Tasks0
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning0
Innate Reasoning is Not Enough: In-Context Learning Enhances Reasoning Large Language Models with Less Overthinking0
Inside you are many wolves: Using cognitive models to interpret value trade-offs in LLMs0
Integrating Arithmetic Learning Improves Mathematical Reasoning in Smaller Models0
Integrating External Tools with Large Language Models to Improve Accuracy0
InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model0
Investigating the Effectiveness of ChatGPT in Mathematical Reasoning and Problem Solving: Evidence from the Vietnamese National High School Graduation Examination0
Investigating the interaction of linguistic and mathematical reasoning in language models using multilingual number puzzles0
Investigating the Potential of Large Language Model-Based Router Multi-Agent Architectures for Foundation Design Automation: A Task Classification and Expert Selection Study0
IrokoBench: A New Benchmark for African Languages in the Age of Large Language Models0
Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist0
iTBLS: A Dataset of Interactive Conversations Over Tabular Information0
JiuZhang 2.0: A Unified Chinese Pre-trained Language Model for Multi-task Mathematical Problem Solving0
Keep Guessing? When Considering Inference Scaling, Mind the Baselines0
Key-Point-Driven Data Synthesis with its Enhancement on Mathematical Reasoning0
Key-Point-Driven Mathematical Reasoning Distillation of Large Language Model0
KisMATH: Do LLMs Have Knowledge of Implicit Structures in Mathematical Reasoning?0
Knowledge Augmented Complex Problem Solving with Large Language Models: A Survey0
Knowledge Distillation of LLM for Automatic Scoring of Science Education Assessments0
Kwai-STaR: Transform LLMs into State-Transition Reasoners0
KwaiYiiMath: Technical Report0
Mathematical Reasoning via Self-supervised Skip-tree Training0
Language Models Use Trigonometry to Do Addition0
LANS: A Layout-Aware Neural Solver for Plane Geometry Problem0
Large Language Models and Mathematical Reasoning Failures0
Large Language Models Don't Make Sense of Word Problems. A Scoping Review from a Mathematics Education Perspective0
Large Language Models for Combinatorial Optimization of Design Structure Matrix0
Large Language Models for Design Structure Matrix Optimization0
Large Language Models for Mathematical Reasoning: Progresses and Challenges0
Large Language Models Have Intrinsic Meta-Cognition, but Need a Good Lens0
Large Multi-Modal Models (LMMs) as Universal Foundation Models for AI-Native Wireless Systems0
Layer Importance for Mathematical Reasoning is Forged in Pre-Training and Invariant after Post-Training0
Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models0
LeanProgress: Guiding Search for Neural Theorem Proving via Proof Progress Prediction0
LearnAlign: Reasoning Data Selection for Reinforcement Learning in Large Language Models Based on Improved Gradient Alignment0
Learning by Applying: A General Framework for Mathematical Reasoning via Enhancing Explicit Knowledge Learning0
Learning Like Humans: Advancing LLM Reasoning Capabilities via Adaptive Difficulty Curriculum Learning and Expert-Guided Self-Reformulation0
Learning to chain-of-thought with Jensen's evidence lower bound0
Learning to Rank Chain-of-Thought: An Energy-Based Approach with Outcome Supervision0
Learning to Reason With Relational Abstractions0
LemmaHead: RAG Assisted Proof Generation Using Large Language Models0
Let's Reason Formally: Natural-Formal Hybrid Reasoning Enhances LLM's Math Capability0
Let's Reinforce Step by Step0
Let's reward step by step: Step-Level reward model as the Navigators for Reasoning0
Show:102550
← PrevPage 10 of 17Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1XolverAcc94.4Unverified
2DeepSeek-r1Acc79.8Unverified
3Openai-o1Acc74.4Unverified
4Openai-o1-miniAcc70Unverified
5Search-o1Acc56.7Unverified
6s1-32BAcc56.7Unverified
7Openai-o1-previewAcc44.6Unverified
8Qwen2.5-72B-InstructAcc23.3Unverified
9Claude3.5-SonnetAcc16Unverified
#ModelMetricClaimedVerifiedStatus
1o3Accuracy0.25Unverified
2Gemini 1.5 Pro (002)Accuracy0.02Unverified
3GPT-4oAccuracy0.01Unverified
4o1-miniAccuracy0.01Unverified
5o1-previewAccuracy0.01Unverified
6Claude 3.5 SonnetAccuracy0.01Unverified
#ModelMetricClaimedVerifiedStatus
1Codex (Few-Shot, 175B)Accuracy0.6Unverified
2Bhāskara-P (Fine-tuned, 2.7B)Accuracy0.48Unverified
3Neo-P (Fine-tuned, 2.7B)Accuracy0.39Unverified
4GPT-3 (Few-Shot, 175B)Accuracy0.38Unverified
5Bhāskara-A (Fine-tuned, 2.7B)Accuracy0.25Unverified
6Neo-A (Fine-tuned, 2.7B)Accuracy0.2Unverified
#ModelMetricClaimedVerifiedStatus
1Codex (Few-Shot, 175B)Accuracy0.59Unverified
2Bhāskara-P (Fine-tuned, 2.7B)Accuracy0.45Unverified
3GPT-3 (Few-Shot, 175B)Accuracy0.38Unverified
4Bhāskara-A (Fine-tuned, 2.7B)Accuracy0.27Unverified
5Neo-P (Fine-tuned, 2.7B)Accuracy0.24Unverified
6Neo-A (Fine-tuned, 2.7B)Accuracy0.18Unverified
#ModelMetricClaimedVerifiedStatus
1GOLDCompletion accuracy65.8Unverified
2PGPSNetCompletion accuracy62.7Unverified
3GAPSCompletion accuracy61.2Unverified
4Inter-GPSCompletion accuracy59.8Unverified
5GeoformerCompletion accuracy35.6Unverified
6NGSCompletion accuracy34.1Unverified
#ModelMetricClaimedVerifiedStatus
1QWQ-32B-previewAcc82.5Unverified
2Math-MasterAcc82Unverified
3Qwen2.5-Math-7B-instructAcc62.5Unverified
#ModelMetricClaimedVerifiedStatus
1GOLDAccuracy (%)75.2Unverified
2GAPSAccuracy (%)67.8Unverified
#ModelMetricClaimedVerifiedStatus
1Search-o1Acc86.4Unverified
#ModelMetricClaimedVerifiedStatus
1GOLDAccuracy (%)98.5Unverified
#ModelMetricClaimedVerifiedStatus
1GAPSAccuracy (%)97.5Unverified