SOTAVerified

GSM8K

Papers

Showing 421430 of 439 papers

TitleStatusHype
Layer-Aware Task Arithmetic: Disentangling Task-Specific and Instruction-Following Knowledge0
Large Language Models Can Self-Improve0
LiteSearch: Efficacious Tree Search for LLM0
LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models0
LLaMa-SciQ: An Educational Chatbot for Answering Science MCQ0
Training Large Language Models to Reason via EM Policy Gradient0
Large Language Models as Analogical Reasoners0
KwaiYiiMath: Technical Report0
Kwai-STaR: Transform LLMs into State-Transition Reasoners0
Meaning-Typed Programming: Language Abstraction and Runtime for Model-Integrated Applications0
Show:102550
← PrevPage 43 of 44Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1XolverAccuracy98.1Unverified
2Orange-mini0-shot MRR98Unverified