SOTAVerified

Arithmetic Reasoning

Papers

Showing 101150 of 175 papers

TitleStatusHype
The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve?0
Inference-Time Computations for LLM Reasoning and Planning: A Benchmark and Insights0
On Representational Dissociation of Language and Arithmetic in Large Language Models0
Why Vision Language Models Struggle with Visual Arithmetic? Towards Enhanced Chart and Geometry Understanding0
Can LLMs Maintain Fundamental Abilities under KV Cache Compression?0
CLoQ: Enhancing Fine-Tuning of Quantized LLMs via Calibrated LoRA Initialization0
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training0
DoTA: Weight-Decomposed Tensor Adaptation for Large Language Models0
Towards Intrinsic Self-Correction Enhancement in Monte Carlo Tree Search Boosted Reasoning via Iterative Preference Learning0
Why We Build Local Large Language Models: An Observational Analysis from 35 Japanese and Multilingual LLMs0
Hint Marginalization for Improved Reasoning in Large Language Models0
GaLore+: Boosting Low-Rank Adaptation for LLMs with Cross-Head Projection0
S^2FT: Efficient, Scalable and Generalizable LLM Fine-tuning by Structured Sparsity0
Think-to-Talk or Talk-to-Think? When LLMs Come Up with an Answer in Multi-Step Arithmetic Reasoning0
PERFT: Parameter-Efficient Routed Fine-Tuning for Mixture-of-Expert Model0
Seq-VCR: Preventing Collapse in Intermediate Transformer Representations for Enhanced ReasoningCode0
Think Beyond Size: Adaptive Prompting for More Effective Reasoning0
Teaching-Inspired Integrated Prompting Framework: A Novel Approach for Enhancing Reasoning in Large Language ModelsCode0
Unlocking Structured Thinking in Language Models with Cognitive Prompting0
Small Language Models are Equation Reasoners0
3-in-1: 2D Rotary Adaptation for Efficient Finetuning, Efficient Batching and ComposabilityCode0
Relating the Seemingly Unrelated: Principled Understanding of Generalization for Generative Models in Arithmetic Reasoning Tasks0
Leveraging LLM Reasoning Enhances Personalized Recommender Systems0
Fine-Tuning and Prompt Optimization: Two Great Steps that Work Better Together0
Self-training Language Models for Arithmetic ReasoningCode0
SBoRA: Low-Rank Adaptation with Regional Weight UpdatesCode0
Improving Arithmetic Reasoning Ability of Large Language Models through Relation Tuples, Verification and Dynamic FeedbackCode0
Buffer of Thoughts: Thought-Augmented Reasoning with Large Language ModelsCode0
Arithmetic Reasoning with LLM: Prolog Generation & Permutation0
Large Language Models Can Self-Correct with Key Condition Verification0
Skin-in-the-Game: Decision Making via Multi-Stakeholder Alignment in LLMs0
Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment0
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLMCode0
The Claude 3 Model Family: Opus, Sonnet, Haiku0
SymBa: Symbolic Backward Chaining for Structured Natural Language Reasoning0
Evaluating LLMs' Mathematical Reasoning in Financial Document Question Answering0
Orca-Math: Unlocking the potential of SLMs in Grade School Math0
Exploring Group and Symmetry Principles in Large Language Models0
The Unreasonable Effectiveness of Eccentric Automatic Prompts0
Evaluating Gender Bias in Large Language Models via Chain-of-Thought Prompting0
Large Language Models are Null-Shot Learners0
LLM Augmented LLMs: Expanding Capabilities through CompositionCode0
TinyGSM: achieving >80% on GSM8k with small language models0
Fewer is More: Boosting LLM Reasoning with Reinforced Context Pruning0
Frugal LMs Trained to Invoke Symbolic Solvers Achieve Parameter-Efficient Arithmetic ReasoningCode0
ChatGPT as a Math Questioner? Evaluating ChatGPT on Generating Pre-university Math QuestionsCode0
Orca 2: Teaching Small Language Models How to Reason0
The ART of LLM Refinement: Ask, Refine, and Trust0
Prompt Sketching for Large Language Models0
KwaiYiiMath: Technical Report0
Show:102550
← PrevPage 3 of 4Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Claude 3.5 Sonnet (HPT)Accuracy97.72Unverified
2DUP prompt upon GPT-4Accuracy97.1Unverified
3Qwen2-Math-72B-Instruct (greedy)Accuracy96.7Unverified
4SFT-Mistral-7B (Metamath, OVM, Smart Ensemble)Accuracy96.4Unverified
5OpenMath2-Llama3.1-70B (majority@256)Accuracy96Unverified
6Jiutian-大模型Accuracy95.2Unverified
7DAMOMath-7B(MetaMath, OVM, BS, Ensemble)Accuracy95.1Unverified
8Claude 3 Opus (0-shot chain-of-thought)Accuracy95Unverified
9OpenMath2-Llama3.1-70BAccuracy94.9Unverified
10GPT-4 (Teaching-Inspired)Accuracy94.8Unverified
#ModelMetricClaimedVerifiedStatus
1Text-davinci-002 (175B)(zero-shot-cot)Accuracy78.7Unverified
2Text-davinci-002 (175B) (zero-shot)Accuracy17.7Unverified
#ModelMetricClaimedVerifiedStatus
1Tree of Thoughts (b=5)Success0.74Unverified
#ModelMetricClaimedVerifiedStatus
1GPT-4 (Teaching-Inspired)Accuracy92.2Unverified
#ModelMetricClaimedVerifiedStatus
1GPT-4 (Teaching-Inspired)Accuracy89.2Unverified