SOTAVerified

GSM8K

Papers

Showing 151160 of 439 papers

TitleStatusHype
Matrix Information Theory for Self-Supervised LearningCode1
FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle SolvingCode1
Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt TemplatesCode1
Learning Goal-Conditioned Representations for Language Reward ModelsCode1
SMART: Self-Aware Agent for Tool Overuse MitigationCode1
LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language FeedbackCode1
Re-Initialization Token Learning for Tool-Augmented Large Language ModelsCode0
Fill in the Blank: Exploring and Enhancing LLM Capabilities for Backward Reasoning in Math Word ProblemsCode0
COrAL: Order-Agnostic Language Modeling for Efficient Iterative RefinementCode0
Reasoning Under 1 Billion: Memory-Augmented Reinforcement Learning for Large Language ModelsCode0
Show:102550
← PrevPage 16 of 44Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1XolverAccuracy98.1Unverified
2Orange-mini0-shot MRR98Unverified