SOTAVerified

GSM8K

Papers

Showing 8190 of 439 papers

TitleStatusHype
Exploring LLM Reasoning Through Controlled Prompt VariationsCode0
Adaptive Rectification Sampling for Test-Time Compute ScalingCode0
Entropy-Based Adaptive Weighting for Self-TrainingCode1
CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning ModelsCode2
Qwen2.5-Omni Technical ReportCode7
Lost in Cultural Translation: Do LLMs Struggle with Math Across Cultural Contexts?Code0
D^2LoRA: Data-Driven LoRA Initialization for Low Resource Tasks0
Chain-of-Tools: Utilizing Massive Unseen Tools in the CoT Reasoning of Frozen Language ModelsCode2
SafeMERGE: Preserving Safety Alignment in Fine-Tuned Large Language Models via Selective Layer-Wise Model MergingCode1
Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs0
Show:102550
← PrevPage 9 of 44Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1XolverAccuracy98.1Unverified
2Orange-mini0-shot MRR98Unverified