SOTAVerified

Arithmetic Reasoning

Papers

Showing 101110 of 175 papers

TitleStatusHype
ChatGPT as a Math Questioner? Evaluating ChatGPT on Generating Pre-university Math QuestionsCode0
Do Deep Neural Networks Capture Compositionality in Arithmetic Reasoning?Code0
Improving Arithmetic Reasoning Ability of Large Language Models through Relation Tuples, Verification and Dynamic FeedbackCode0
Frugal LMs Trained to Invoke Symbolic Solvers Achieve Parameter-Efficient Arithmetic ReasoningCode0
Teaching-Inspired Integrated Prompting Framework: A Novel Approach for Enhancing Reasoning in Large Language ModelsCode0
Seq-VCR: Preventing Collapse in Intermediate Transformer Representations for Enhanced ReasoningCode0
LLM Augmented LLMs: Expanding Capabilities through CompositionCode0
Self-training Language Models for Arithmetic ReasoningCode0
Your Language Model May Think Too Rigidly: Achieving Reasoning Consistency with Symmetry-Enhanced Training0
Leveraging LLM Reasoning Enhances Personalized Recommender Systems0
Show:102550
← PrevPage 11 of 18Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Claude 3.5 Sonnet (HPT)Accuracy97.72Unverified
2DUP prompt upon GPT-4Accuracy97.1Unverified
3Qwen2-Math-72B-Instruct (greedy)Accuracy96.7Unverified
4SFT-Mistral-7B (Metamath, OVM, Smart Ensemble)Accuracy96.4Unverified
5OpenMath2-Llama3.1-70B (majority@256)Accuracy96Unverified
6Jiutian-大模型Accuracy95.2Unverified
7DAMOMath-7B(MetaMath, OVM, BS, Ensemble)Accuracy95.1Unverified
8Claude 3 Opus (0-shot chain-of-thought)Accuracy95Unverified
9OpenMath2-Llama3.1-70BAccuracy94.9Unverified
10GPT-4 (Teaching-Inspired)Accuracy94.8Unverified
#ModelMetricClaimedVerifiedStatus
1Text-davinci-002 (175B)(zero-shot-cot)Accuracy78.7Unverified
2Text-davinci-002 (175B) (zero-shot)Accuracy17.7Unverified
#ModelMetricClaimedVerifiedStatus
1Tree of Thoughts (b=5)Success0.74Unverified
#ModelMetricClaimedVerifiedStatus
1GPT-4 (Teaching-Inspired)Accuracy92.2Unverified
#ModelMetricClaimedVerifiedStatus
1GPT-4 (Teaching-Inspired)Accuracy89.2Unverified