SOTAVerified

GSM8K

Papers

Showing 326350 of 439 papers

TitleStatusHype
DNA 1.0 Technical Report0
UPAR: A Kantian-Inspired Prompting Framework for Enhancing Large Language Model Capabilities0
DiversiGATE: A Comprehensive Framework for Reliable Large Language Models0
YODA: Teacher-Student Progressive Learning for Language Models0
Distill Not Only Data but Also Rewards: Can Smaller Language Models Surpass Larger Ones?0
SECURA: Sigmoid-Enhanced CUR Decomposition with Uninterrupted Retention and Low-Rank Adaptation in Large Language Models0
Dictionary Insertion Prompting for Multilingual Reasoning on Multilingual Large Language Models0
Dialectical Behavior Therapy Approach to LLM Prompting0
Adaptive Inference-Time Compute: LLMs Can Predict if They Can Do Better, Even Mid-Generation0
SelectLLM: Query-Aware Efficient Selection Algorithm for Large Language Models0
D^2LoRA: Data-Driven LoRA Initialization for Low Resource Tasks0
Self-Consistency Boosts Calibration for Math Reasoning0
Critic-CoT: Boosting the reasoning abilities of large language model via Chain-of-thoughts Critic0
Self-Consistency Preference Optimization0
Self-Evaluation Guided Beam Search for Reasoning0
Self-Evolved Preference Optimization for Enhancing Mathematical Reasoning in Small Language Models0
CPL: Critical Plan Step Learning Boosts LLM Generalization in Reasoning Tasks0
Self-Imagine: Effective Unimodal Reasoning with Multimodal Models using Self-Imagination0
CoThink: Token-Efficient Reasoning via Instruct Models Guiding Reasoning Models0
Self-Reasoning Language Models: Unfold Hidden Reasoning Chains with Few Reasoning Catalyst0
Cost-Saving LLM Cascades with Early Abstention0
Self-Training Large Language Models for Tool-Use Without Demonstrations0
Hint of Thought prompting: an explainable and zero-shot approach to reasoning tasks with LLMs0
Semantic Exploration with Adaptive Gating for Efficient Problem Solving with Language Models0
CoRE: Enhancing Metacognition with Label-free Self-evaluation in LRMs0
Show:102550
← PrevPage 14 of 18Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1XolverAccuracy98.1Unverified
2Orange-mini0-shot MRR98Unverified