SOTAVerified

Arithmetic Reasoning

Papers

Showing 131140 of 175 papers

TitleStatusHype
Skin-in-the-Game: Decision Making via Multi-Stakeholder Alignment in LLMs0
Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment0
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM0
The Claude 3 Model Family: Opus, Sonnet, Haiku0
SymBa: Symbolic Backward Chaining for Structured Natural Language Reasoning0
Evaluating LLMs' Mathematical Reasoning in Financial Document Question Answering0
Orca-Math: Unlocking the potential of SLMs in Grade School Math0
Exploring Group and Symmetry Principles in Large Language Models0
The Unreasonable Effectiveness of Eccentric Automatic Prompts0
Evaluating Gender Bias in Large Language Models via Chain-of-Thought Prompting0
Show:102550
← PrevPage 14 of 18Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Claude 3.5 Sonnet (HPT)Accuracy97.72Unverified
2DUP prompt upon GPT-4Accuracy97.1Unverified
3Qwen2-Math-72B-Instruct (greedy)Accuracy96.7Unverified
4SFT-Mistral-7B (Metamath, OVM, Smart Ensemble)Accuracy96.4Unverified
5OpenMath2-Llama3.1-70B (majority@256)Accuracy96Unverified
6Jiutian-大模型Accuracy95.2Unverified
7DAMOMath-7B(MetaMath, OVM, BS, Ensemble)Accuracy95.1Unverified
8Claude 3 Opus (0-shot chain-of-thought)Accuracy95Unverified
9OpenMath2-Llama3.1-70BAccuracy94.9Unverified
10GPT-4 (Teaching-Inspired)Accuracy94.8Unverified
#ModelMetricClaimedVerifiedStatus
1Text-davinci-002 (175B)(zero-shot-cot)Accuracy78.7Unverified
2Text-davinci-002 (175B) (zero-shot)Accuracy17.7Unverified
#ModelMetricClaimedVerifiedStatus
1Tree of Thoughts (b=5)Success0.74Unverified
#ModelMetricClaimedVerifiedStatus
1GPT-4 (Teaching-Inspired)Accuracy92.2Unverified
#ModelMetricClaimedVerifiedStatus
1GPT-4 (Teaching-Inspired)Accuracy89.2Unverified