SOTAVerified|Agents Browse Leaderboard About

GSM8K

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 81–90 of 439 papers

Title	Date	Tasks	Status	Hype
Exploring LLM Reasoning Through Controlled Prompt Variations	Apr 2, 2025	GSM8KMathematical Problem-Solving	CodeCode Available	0
Adaptive Rectification Sampling for Test-Time Compute Scaling	Apr 2, 2025	GSM8KLogical Reasoning	CodeCode Available	0
Entropy-Based Adaptive Weighting for Self-Training	Mar 31, 2025	GSM8KMath	CodeCode Available	1
CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models	Mar 28, 2025	GPUGSM8K	CodeCode Available	2
Qwen2.5-Omni Technical Report	Mar 26, 2025	Automatic Speech Recognition (ASR)GSM8K	CodeCode Available	7
Lost in Cultural Translation: Do LLMs Struggle with Math Across Cultural Contexts?	Mar 23, 2025	GSM8KMath	CodeCode Available	0
D^2LoRA: Data-Driven LoRA Initialization for Low Resource Tasks	Mar 23, 2025	GSM8K	—Unverified	0
Chain-of-Tools: Utilizing Massive Unseen Tools in the CoT Reasoning of Frozen Language Models	Mar 21, 2025	GSM8KQuestion Answering	CodeCode Available	2
SafeMERGE: Preserving Safety Alignment in Fine-Tuned Large Language Models via Selective Layer-Wise Model Merging	Mar 21, 2025	GSM8KSafety Alignment	CodeCode Available	1
Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs	Mar 18, 2025	GSM8KMath	—Unverified	0

Show:10 25 50

← PrevPage 9 of 44Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Xolver	Accuracy	98.1	—	Unverified
2	Orange-mini	0-shot MRR	98	—	Unverified