Arithmetic Reasoning
Papers
Showing 1–10 of 175 papers
Benchmark Results
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | Claude 3.5 Sonnet (HPT) | Accuracy | 97.72 | — | Unverified |
| 2 | DUP prompt upon GPT-4 | Accuracy | 97.1 | — | Unverified |
| 3 | Qwen2-Math-72B-Instruct (greedy) | Accuracy | 96.7 | — | Unverified |
| 4 | SFT-Mistral-7B (Metamath, OVM, Smart Ensemble) | Accuracy | 96.4 | — | Unverified |
| 5 | OpenMath2-Llama3.1-70B (majority@256) | Accuracy | 96 | — | Unverified |
| 6 | Jiutian-大模型 | Accuracy | 95.2 | — | Unverified |
| 7 | DAMOMath-7B(MetaMath, OVM, BS, Ensemble) | Accuracy | 95.1 | — | Unverified |
| 8 | Claude 3 Opus (0-shot chain-of-thought) | Accuracy | 95 | — | Unverified |
| 9 | OpenMath2-Llama3.1-70B | Accuracy | 94.9 | — | Unverified |
| 10 | GPT-4 (Teaching-Inspired) | Accuracy | 94.8 | — | Unverified |