SOTAVerified|Agents Browse Leaderboard About

GSM8K

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 281–290 of 439 papers

Title	Date	Tasks	Status	Hype	Score
Evolutionary Pre-Prompt Optimization for Mathematical Reasoning	Dec 5, 2024	Few-Shot LearningGSM8K	—Unverified	0	0
Premise Order Matters in Reasoning with Large Language Models	Feb 14, 2024	GSM8KMathematical Problem-Solving	—Unverified	0	0
PREMISE: Scalable and Strategic Prompt Optimization for Efficient Mathematical Reasoning in Large Models	Jun 12, 2025	GSM8KMathematical Reasoning	—Unverified	0	0
Evaluation of LLMs for mathematical problem solving	May 30, 2025	GSM8KMathematical Problem-Solving	—Unverified	0	0
Entropy-Guided Watermarking for LLMs: A Test-Time Framework for Robust and Traceable Text Generation	Apr 16, 2025	GSM8KMath	—Unverified	0	0
Prompt Baking	Sep 4, 2024	ARCGSM8K	—Unverified	0	0
Enhancing Reasoning Capabilities of Small Language Models with Blueprints and Prompt Template Search	Jun 10, 2025	GSM8KMath	—Unverified	0	0
Prompt Engineering a Prompt Engineer	Nov 9, 2023	counterfactualCounterfactual Reasoning	—Unverified	0	0
Prompt-SAW: Leveraging Relation-Aware Graphs for Textual Prompt Compression	Mar 30, 2024	GSM8KRelation	—Unverified	0	0
Prompt Selection and Augmentation for Few Examples Code Generation in Large Language Model and its Application in Robotics Control	Mar 11, 2024	Code GenerationDiversity	—Unverified	0	0

Show:10 25 50

← PrevPage 29 of 44Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Xolver	Accuracy	98.1	—	Unverified
2	Orange-mini	0-shot MRR	98	—	Unverified