SOTAVerified

Mathematical Reasoning

Papers

Showing 751800 of 805 papers

TitleStatusHype
Notes on a Path to AI Assistance in Mathematical Reasoning0
Novice Learner and Expert Tutor: Evaluating Math Reasoning Abilities of Large Language Models with Misconceptions0
LPML: LLM-Prompting Markup Language for Mathematical Reasoning0
Code Soliloquies for Accurate Calculations in Large Language ModelsCode0
On the meaning of uncertainty for ethical AI: philosophy and practice0
No Train Still Gain. Unleash Mathematical Reasoning of Large Language Models with Monte Carlo Tree Search Guided by Energy Function0
Probabilistic Results on the Architecture of Mathematical Reasoning Aligned by Cognitive Alternation0
Forward-Backward Reasoning in Large Language Models for Mathematical Verification0
Skills-in-Context Prompting: Unlocking Compositionality in Large Language Models0
MinT: Boosting Generalization in Mathematical Reasoning via Multi-View Fine-Tuning0
MWPRanker: An Expression Similarity Based Math Word Problem Retriever0
Math Word Problem Solving by Generating Linguistic Variants of Problem StatementsCode0
JiuZhang 2.0: A Unified Chinese Pre-trained Language Model for Multi-task Mathematical Problem Solving0
Position: AI Evaluation Should Learn from How We Test HumansCode0
Investigating the Effectiveness of ChatGPT in Mathematical Reasoning and Problem Solving: Evidence from the Vietnamese National High School Graduation Examination0
Random Feedback Alignment Algorithms to train Neural Networks: Why do they Align?0
A Symbolic Framework for Evaluating Mathematical Reasoning and Generalisation with Transformers0
Federated Prompting and Chain-of-Thought Reasoning for Improving LLMs Answering0
Learning by Applying: A General Framework for Mathematical Reasoning via Enhancing Explicit Knowledge Learning0
Explanation Selection Using Unlabeled Data for Chain-of-Thought PromptingCode0
Reliable Natural Language Understanding with Large Language Models and Answer Set Programming0
Techniques to Improve Neural Math Word Problem SolversCode0
LEMMA: Bootstrapping High-Level Mathematical Reasoning with Learned Symbolic AbstractionsCode0
Overcoming Barriers to Skill Injection in Language Modeling: Case Study in ArithmeticCode0
Blank Collapse: Compressing CTC emission for the faster decodingCode0
Composing Ensembles of Pre-trained Models via Iterative Consensus0
Learning to Reason With Relational Abstractions0
Weakly Supervised Formula Learner for Solving Mathematical ProblemsCode0
Transformers discover an elementary calculation system exploiting local attention and grid-like problem representationCode0
MMTM: Multi-Tasking Multi-Decoder Transformer for Math Word Problems0
Why are NLP Models Fumbling at Elementary Math? A Survey of Deep Learning based Word Problem Solvers0
NumGLUE: A Suite of Fundamental yet Challenging Mathematical Reasoning Tasks0
Enhancing Neural Mathematical Reasoning by Abductive Combination with Symbolic Library0
Why are NLP Models Fumbling at Elementary Math? A Survey of Automatic Word Problem Solvers0
Theoretical Analysis of an XGBoost Framework for Product Cannibalization0
GraphMR: Graph Neural Network for Mathematical Reasoning0
Towards Tractable Mathematical Reasoning: Challenges, Strategies, and Opportunities for Solving Math Word Problems0
Conjectures, Tests and Proofs: An Overview of Theory Exploration0
Reasoning with Transformer-based Models: Deep Learning, but Shallow ReasoningCode0
Compositional Processing Emerges in Neural Networks Solving Math ProblemsCode0
Sustainability of Collusion and Market Transparency in a Sequential Search Market: a Generalization0
The Role of General Intelligence in Mathematical Reasoning0
Recognizing and Verifying Mathematical Equations using Multiplicative Differential Neural Units0
SMART: A Situation Model for Algebra Story Problems via Attributed Grammar0
Noisy Deductive Reasoning: How Humans Construct Math, and How Math Constructs Universes0
Reverse Operation based Data Augmentation for Solving Math Word ProblemsCode0
Adventures in Mathematical Reasoning0
Mathematical Reasoning via Self-supervised Skip-tree Training0
Compositional Generalization with Tree Stack Memory UnitsCode0
Mathematical Reasoning in Latent Space0
Show:102550
← PrevPage 16 of 17Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1XolverAcc94.4Unverified
2DeepSeek-r1Acc79.8Unverified
3Openai-o1Acc74.4Unverified
4Openai-o1-miniAcc70Unverified
5Search-o1Acc56.7Unverified
6s1-32BAcc56.7Unverified
7Openai-o1-previewAcc44.6Unverified
8Qwen2.5-72B-InstructAcc23.3Unverified
9Claude3.5-SonnetAcc16Unverified
#ModelMetricClaimedVerifiedStatus
1o3Accuracy0.25Unverified
2Gemini 1.5 Pro (002)Accuracy0.02Unverified
3GPT-4oAccuracy0.01Unverified
4o1-miniAccuracy0.01Unverified
5o1-previewAccuracy0.01Unverified
6Claude 3.5 SonnetAccuracy0.01Unverified
#ModelMetricClaimedVerifiedStatus
1Codex (Few-Shot, 175B)Accuracy0.6Unverified
2Bhāskara-P (Fine-tuned, 2.7B)Accuracy0.48Unverified
3Neo-P (Fine-tuned, 2.7B)Accuracy0.39Unverified
4GPT-3 (Few-Shot, 175B)Accuracy0.38Unverified
5Bhāskara-A (Fine-tuned, 2.7B)Accuracy0.25Unverified
6Neo-A (Fine-tuned, 2.7B)Accuracy0.2Unverified
#ModelMetricClaimedVerifiedStatus
1Codex (Few-Shot, 175B)Accuracy0.59Unverified
2Bhāskara-P (Fine-tuned, 2.7B)Accuracy0.45Unverified
3GPT-3 (Few-Shot, 175B)Accuracy0.38Unverified
4Bhāskara-A (Fine-tuned, 2.7B)Accuracy0.27Unverified
5Neo-P (Fine-tuned, 2.7B)Accuracy0.24Unverified
6Neo-A (Fine-tuned, 2.7B)Accuracy0.18Unverified
#ModelMetricClaimedVerifiedStatus
1GOLDCompletion accuracy65.8Unverified
2PGPSNetCompletion accuracy62.7Unverified
3GAPSCompletion accuracy61.2Unverified
4Inter-GPSCompletion accuracy59.8Unverified
5GeoformerCompletion accuracy35.6Unverified
6NGSCompletion accuracy34.1Unverified
#ModelMetricClaimedVerifiedStatus
1QWQ-32B-previewAcc82.5Unverified
2Math-MasterAcc82Unverified
3Qwen2.5-Math-7B-instructAcc62.5Unverified
#ModelMetricClaimedVerifiedStatus
1GOLDAccuracy (%)75.2Unverified
2GAPSAccuracy (%)67.8Unverified
#ModelMetricClaimedVerifiedStatus
1Search-o1Acc86.4Unverified
#ModelMetricClaimedVerifiedStatus
1GOLDAccuracy (%)98.5Unverified
#ModelMetricClaimedVerifiedStatus
1GAPSAccuracy (%)97.5Unverified