SOTAVerified

GSM8K

Papers

Showing 301350 of 439 papers

TitleStatusHype
ReasonAgain: Using Extractable Symbolic Programs to Evaluate Mathematical Reasoning0
Reasoning-as-Logic-Units: Scaling Test-Time Reasoning in Large Language Models Through Logic Unit Alignment0
Reasoning Paths Optimization: Learning to Reason and Explore From Diverse Paths0
Reasoning Robustness of LLMs to Adversarial Typographical Errors0
Unlocking Structured Thinking in Language Models with Cognitive Prompting0
Recursive Decomposition of Logical Thoughts: Framework for Superior Reasoning and Knowledge Propagation in Large Language Models0
Dynamic Parallel Tree Search for Efficient LLM Reasoning0
Dual Decomposition of Weights and Singular Value Low Rank Adaptation0
Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models0
Unraveling Arithmetic in Large Language Models: The Role of Algebraic Structures0
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models0
Relevant or Random: Can LLMs Truly Perform Analogical Reasoning?0
Reliable Reasoning Beyond Natural Language0
Rethinking Data Synthesis: A Teacher Model Training Recipe with Interpretation0
DRP: Distilled Reasoning Pruning with Skill-aware Step Decomposition for Efficient Large Reasoning Models0
RevOrder: A Novel Method for Enhanced Arithmetic in Language Models0
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs0
RL in Name Only? Analyzing the Structural Assumptions in RL post-training for LLMs0
RotateKV: Accurate and Robust 2-Bit KV Cache Quantization for LLMs via Outlier-Aware Adaptive Rotations0
Rule-Guided Feedback: Enhancing Reasoning by Enforcing Rule Adherence in Large Language Models0
S^3c-Math: Spontaneous Step-level Self-correction Makes Large Language Models Better Mathematical Reasoners0
Does your data spark joy? Performance gains from domain upsampling at the end of training0
SAIE Framework: Support Alone Isn't Enough -- Advancing LLM Training with Adversarial Remarks0
Sample, Don't Search: Rethinking Test-Time Alignment for Language Models0
Unsupervised Elicitation of Language Models0
DNA 1.0 Technical Report0
UPAR: A Kantian-Inspired Prompting Framework for Enhancing Large Language Model Capabilities0
DiversiGATE: A Comprehensive Framework for Reliable Large Language Models0
YODA: Teacher-Student Progressive Learning for Language Models0
Distill Not Only Data but Also Rewards: Can Smaller Language Models Surpass Larger Ones?0
SECURA: Sigmoid-Enhanced CUR Decomposition with Uninterrupted Retention and Low-Rank Adaptation in Large Language Models0
Dictionary Insertion Prompting for Multilingual Reasoning on Multilingual Large Language Models0
Dialectical Behavior Therapy Approach to LLM Prompting0
Adaptive Inference-Time Compute: LLMs Can Predict if They Can Do Better, Even Mid-Generation0
SelectLLM: Query-Aware Efficient Selection Algorithm for Large Language Models0
D^2LoRA: Data-Driven LoRA Initialization for Low Resource Tasks0
Self-Consistency Boosts Calibration for Math Reasoning0
Critic-CoT: Boosting the reasoning abilities of large language model via Chain-of-thoughts Critic0
Self-Consistency Preference Optimization0
Self-Evaluation Guided Beam Search for Reasoning0
Self-Evolved Preference Optimization for Enhancing Mathematical Reasoning in Small Language Models0
CPL: Critical Plan Step Learning Boosts LLM Generalization in Reasoning Tasks0
Self-Imagine: Effective Unimodal Reasoning with Multimodal Models using Self-Imagination0
CoThink: Token-Efficient Reasoning via Instruct Models Guiding Reasoning Models0
Self-Reasoning Language Models: Unfold Hidden Reasoning Chains with Few Reasoning Catalyst0
Cost-Saving LLM Cascades with Early Abstention0
Self-Training Large Language Models for Tool-Use Without Demonstrations0
Hint of Thought prompting: an explainable and zero-shot approach to reasoning tasks with LLMs0
Semantic Exploration with Adaptive Gating for Efficient Problem Solving with Language Models0
CoRE: Enhancing Metacognition with Label-free Self-evaluation in LRMs0
Show:102550
← PrevPage 7 of 9Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1XolverAccuracy98.1Unverified
2Orange-mini0-shot MRR98Unverified