SOTAVerified|Agents Browse Leaderboard About

GSM8K

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 201–210 of 439 papers

Title	Date	Tasks	Status	Hype
Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models	Oct 10, 2024	GSM8KMath	CodeCode Available	2
Think Beyond Size: Adaptive Prompting for More Effective Reasoning	Oct 10, 2024	Arithmetic ReasoningComputational Efficiency	—Unverified	0
Dialectical Behavior Therapy Approach to LLM Prompting	Oct 10, 2024	GSM8KStrategyQA	—Unverified	0
Subtle Errors Matter: Preference Learning via Error-injected Self-editing	Oct 9, 2024	GSM8KMath	—Unverified	0
PortLLM: Personalizing Evolving Large Language Models with Training-Free and Portable Model Patches	Oct 8, 2024	GPUGSM8K	—Unverified	0
Coevolving with the Other You: Fine-Tuning LLM with Sequential Cooperative Multi-Agent Reinforcement Learning	Oct 8, 2024	GSM8KMulti-agent Reinforcement Learning	CodeCode Available	1
FG-PRM: Fine-grained Hallucination Detection and Mitigation in Language Model Mathematical Reasoning	Oct 8, 2024	GSM8KHallucination	—Unverified	0
Reasoning Paths Optimization: Learning to Reason and Explore From Diverse Paths	Oct 7, 2024	AttributeGSM8K	—Unverified	0
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models	Oct 7, 2024	GSM8KLogical Reasoning	CodeCode Available	1
Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification	Oct 5, 2024	GSM8KMath	—Unverified	0

Show:10 25 50

← PrevPage 21 of 44Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Xolver	Accuracy	98.1	—	Unverified
2	Orange-mini	0-shot MRR	98	—	Unverified