Mathematical Reasoning
Papers
No papers found.
Benchmark Results
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | Xolver | Acc | 94.4 | — | Unverified |
| 2 | DeepSeek-r1 | Acc | 79.8 | — | Unverified |
| 3 | Openai-o1 | Acc | 74.4 | — | Unverified |
| 4 | Openai-o1-mini | Acc | 70 | — | Unverified |
| 5 | s1-32B | Acc | 56.7 | — | Unverified |
| 6 | Search-o1 | Acc | 56.7 | — | Unverified |
| 7 | Openai-o1-preview | Acc | 44.6 | — | Unverified |
| 8 | Qwen2.5-72B-Instruct | Acc | 23.3 | — | Unverified |
| 9 | Claude3.5-Sonnet | Acc | 16 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | o3 | Accuracy | 0.25 | — | Unverified |
| 2 | Gemini 1.5 Pro (002) | Accuracy | 0.02 | — | Unverified |
| 3 | o1-preview | Accuracy | 0.01 | — | Unverified |
| 4 | GPT-4o | Accuracy | 0.01 | — | Unverified |
| 5 | Claude 3.5 Sonnet | Accuracy | 0.01 | — | Unverified |
| 6 | o1-mini | Accuracy | 0.01 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | Codex (Few-Shot, 175B) | Accuracy | 0.6 | — | Unverified |
| 2 | Bhāskara-P (Fine-tuned, 2.7B) | Accuracy | 0.48 | — | Unverified |
| 3 | Neo-P (Fine-tuned, 2.7B) | Accuracy | 0.39 | — | Unverified |
| 4 | GPT-3 (Few-Shot, 175B) | Accuracy | 0.38 | — | Unverified |
| 5 | Bhāskara-A (Fine-tuned, 2.7B) | Accuracy | 0.25 | — | Unverified |
| 6 | Neo-A (Fine-tuned, 2.7B) | Accuracy | 0.2 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | Codex (Few-Shot, 175B) | Accuracy | 0.59 | — | Unverified |
| 2 | Bhāskara-P (Fine-tuned, 2.7B) | Accuracy | 0.45 | — | Unverified |
| 3 | GPT-3 (Few-Shot, 175B) | Accuracy | 0.38 | — | Unverified |
| 4 | Bhāskara-A (Fine-tuned, 2.7B) | Accuracy | 0.27 | — | Unverified |
| 5 | Neo-P (Fine-tuned, 2.7B) | Accuracy | 0.24 | — | Unverified |
| 6 | Neo-A (Fine-tuned, 2.7B) | Accuracy | 0.18 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | QWQ-32B-preview | Acc | 82.5 | — | Unverified |
| 2 | Math-Master | Acc | 82 | — | Unverified |
| 3 | Qwen2.5-Math-7B-instruct | Acc | 62.5 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | Search-o1 | Acc | 86.4 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | GOLD | Accuracy (%) | 98.5 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | GAPS | Accuracy (%) | 97.5 | — | Unverified |