SOTAVerified|Agents Browse Leaderboard About

GSM8K

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 241–250 of 439 papers

Title	Date	Tasks	Status	Hype
MathFimer: Enhancing Mathematical Reasoning by Expanding Reasoning Steps through Fill-in-the-Middle Task	Feb 17, 2025	Code CompletionGSM8K	—Unverified	0
Leveraging Uncertainty Estimation for Efficient LLM Routing	Feb 16, 2025	GSM8KMMLU	—Unverified	0
Balancing the Budget: Understanding Trade-offs Between Supervised and Preference-Based Finetuning	Feb 16, 2025	GSM8K	—Unverified	0
Uncertainty-Aware Search and Value Models: Mitigating Search Scaling Flaws in LLMs	Feb 16, 2025	GSM8KThompson Sampling	—Unverified	0
Don't Get Lost in the Trees: Streamlining LLM Reasoning by Overcoming Tree Search Exploration Pitfalls	Feb 16, 2025	Computational EfficiencyGSM8K	CodeCode Available	0
Hybrid Offline-online Scheduling Method for Large Language Model Inference Optimization	Feb 14, 2025	GSM8KInference Optimization	—Unverified	0
Cost-Saving LLM Cascades with Early Abstention	Feb 13, 2025	GSM8KMMLU	—Unverified	0
Mathematical Reasoning in Large Language Models: Assessing Logical and Arithmetic Errors across Wide Numerical Ranges	Feb 12, 2025	GSM8KMath	CodeCode Available	0
Self-Training Large Language Models for Tool-Use Without Demonstrations	Feb 9, 2025	GSM8KMathematical Reasoning	—Unverified	0
Evolving LLMs' Self-Refinement Capability via Iterative Preference Optimization	Feb 8, 2025	GSM8KMath	—Unverified	0

Show:10 25 50

← PrevPage 25 of 44Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Xolver	Accuracy	98.1	—	Unverified
2	Orange-mini	0-shot MRR	98	—	Unverified