SOTAVerified

Arithmetic Reasoning

Papers

Showing 126150 of 175 papers

TitleStatusHype
SBoRA: Low-Rank Adaptation with Regional Weight UpdatesCode0
Improving Arithmetic Reasoning Ability of Large Language Models through Relation Tuples, Verification and Dynamic FeedbackCode0
Buffer of Thoughts: Thought-Augmented Reasoning with Large Language ModelsCode0
Arithmetic Reasoning with LLM: Prolog Generation & Permutation0
Large Language Models Can Self-Correct with Key Condition Verification0
Skin-in-the-Game: Decision Making via Multi-Stakeholder Alignment in LLMs0
Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment0
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLMCode0
The Claude 3 Model Family: Opus, Sonnet, Haiku0
SymBa: Symbolic Backward Chaining for Structured Natural Language Reasoning0
Evaluating LLMs' Mathematical Reasoning in Financial Document Question Answering0
Orca-Math: Unlocking the potential of SLMs in Grade School Math0
Exploring Group and Symmetry Principles in Large Language Models0
The Unreasonable Effectiveness of Eccentric Automatic Prompts0
Evaluating Gender Bias in Large Language Models via Chain-of-Thought Prompting0
Large Language Models are Null-Shot Learners0
LLM Augmented LLMs: Expanding Capabilities through CompositionCode0
TinyGSM: achieving >80% on GSM8k with small language models0
Fewer is More: Boosting LLM Reasoning with Reinforced Context Pruning0
Frugal LMs Trained to Invoke Symbolic Solvers Achieve Parameter-Efficient Arithmetic ReasoningCode0
ChatGPT as a Math Questioner? Evaluating ChatGPT on Generating Pre-university Math QuestionsCode0
Orca 2: Teaching Small Language Models How to Reason0
The ART of LLM Refinement: Ask, Refine, and Trust0
Prompt Sketching for Large Language Models0
KwaiYiiMath: Technical Report0
Show:102550
← PrevPage 6 of 7Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Claude 3.5 Sonnet (HPT)Accuracy97.72Unverified
2DUP prompt upon GPT-4Accuracy97.1Unverified
3Qwen2-Math-72B-Instruct (greedy)Accuracy96.7Unverified
4SFT-Mistral-7B (Metamath, OVM, Smart Ensemble)Accuracy96.4Unverified
5OpenMath2-Llama3.1-70B (majority@256)Accuracy96Unverified
6Jiutian-大模型Accuracy95.2Unverified
7DAMOMath-7B(MetaMath, OVM, BS, Ensemble)Accuracy95.1Unverified
8Claude 3 Opus (0-shot chain-of-thought)Accuracy95Unverified
9OpenMath2-Llama3.1-70BAccuracy94.9Unverified
10GPT-4 (Teaching-Inspired)Accuracy94.8Unverified
#ModelMetricClaimedVerifiedStatus
1Text-davinci-002 (175B)(zero-shot-cot)Accuracy78.7Unverified
2Text-davinci-002 (175B) (zero-shot)Accuracy17.7Unverified
#ModelMetricClaimedVerifiedStatus
1Tree of Thoughts (b=5)Success0.74Unverified
#ModelMetricClaimedVerifiedStatus
1GPT-4 (Teaching-Inspired)Accuracy92.2Unverified
#ModelMetricClaimedVerifiedStatus
1GPT-4 (Teaching-Inspired)Accuracy89.2Unverified