SOTAVerified

Arithmetic Reasoning

Papers

Showing 101110 of 175 papers

TitleStatusHype
The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve?0
Inference-Time Computations for LLM Reasoning and Planning: A Benchmark and Insights0
On Representational Dissociation of Language and Arithmetic in Large Language Models0
Why Vision Language Models Struggle with Visual Arithmetic? Towards Enhanced Chart and Geometry Understanding0
Can LLMs Maintain Fundamental Abilities under KV Cache Compression?0
CLoQ: Enhancing Fine-Tuning of Quantized LLMs via Calibrated LoRA Initialization0
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training0
DoTA: Weight-Decomposed Tensor Adaptation for Large Language Models0
Towards Intrinsic Self-Correction Enhancement in Monte Carlo Tree Search Boosted Reasoning via Iterative Preference Learning0
Why We Build Local Large Language Models: An Observational Analysis from 35 Japanese and Multilingual LLMs0
Show:102550
← PrevPage 11 of 18Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Claude 3.5 Sonnet (HPT)Accuracy97.72Unverified
2DUP prompt upon GPT-4Accuracy97.1Unverified
3Qwen2-Math-72B-Instruct (greedy)Accuracy96.7Unverified
4SFT-Mistral-7B (Metamath, OVM, Smart Ensemble)Accuracy96.4Unverified
5OpenMath2-Llama3.1-70B (majority@256)Accuracy96Unverified
6Jiutian-大模型Accuracy95.2Unverified
7DAMOMath-7B(MetaMath, OVM, BS, Ensemble)Accuracy95.1Unverified
8Claude 3 Opus (0-shot chain-of-thought)Accuracy95Unverified
9OpenMath2-Llama3.1-70BAccuracy94.9Unverified
10GPT-4 (Teaching-Inspired)Accuracy94.8Unverified
#ModelMetricClaimedVerifiedStatus
1Text-davinci-002 (175B)(zero-shot-cot)Accuracy78.7Unverified
2Text-davinci-002 (175B) (zero-shot)Accuracy17.7Unverified
#ModelMetricClaimedVerifiedStatus
1Tree of Thoughts (b=5)Success0.74Unverified
#ModelMetricClaimedVerifiedStatus
1GPT-4 (Teaching-Inspired)Accuracy92.2Unverified
#ModelMetricClaimedVerifiedStatus
1GPT-4 (Teaching-Inspired)Accuracy89.2Unverified