Arithmetic Reasoning
Papers
No papers found.
Benchmark Results
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | Claude 3.5 Sonnet (HPT) | Accuracy | 97.72 | — | Unverified |
| 2 | DUP prompt upon GPT-4 | Accuracy | 97.1 | — | Unverified |
| 3 | Qwen2-Math-72B-Instruct (greedy) | Accuracy | 96.7 | — | Unverified |
| 4 | SFT-Mistral-7B (Metamath, OVM, Smart Ensemble) | Accuracy | 96.4 | — | Unverified |
| 5 | OpenMath2-Llama3.1-70B (majority@256) | Accuracy | 96 | — | Unverified |
| 6 | Jiutian-大模型 | Accuracy | 95.2 | — | Unverified |
| 7 | DAMOMath-7B(MetaMath, OVM, BS, Ensemble) | Accuracy | 95.1 | — | Unverified |
| 8 | Claude 3 Opus (0-shot chain-of-thought) | Accuracy | 95 | — | Unverified |
| 9 | OpenMath2-Llama3.1-70B | Accuracy | 94.9 | — | Unverified |
| 10 | GPT-4 (Teaching-Inspired) | Accuracy | 94.8 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | Text-davinci-002 (175B)(zero-shot-cot) | Accuracy | 78.7 | — | Unverified |
| 2 | Text-davinci-002 (175B) (zero-shot) | Accuracy | 17.7 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | Tree of Thoughts (b=5) | Success | 0.74 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | GPT-4 (Teaching-Inspired) | Accuracy | 92.2 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | GPT-4 (Teaching-Inspired) | Accuracy | 89.2 | — | Unverified |