SOTAVerified

GSM8K

Papers

Showing 141150 of 439 papers

TitleStatusHype
Breaking Language Barriers in Multilingual Mathematical Reasoning: Insights and ObservationsCode1
Learning From Mistakes Makes LLM Better ReasonerCode1
TRACE: A Comprehensive Benchmark for Continual Learning in Large Language ModelsCode1
Design of Chain-of-Thought in Math Problem SolvingCode1
Large Language Models as OptimizersCode1
AskIt: Unified Programming Interface for Programming with Large Language ModelsCode1
SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step ReasoningCode1
Matrix Information Theory for Self-Supervised LearningCode1
GRACE: Discriminator-Guided Chain-of-Thought ReasoningCode1
Automatic Model Selection with Large Language Models for ReasoningCode1
Show:102550
← PrevPage 15 of 44Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1XolverAccuracy98.1Unverified
2Orange-mini0-shot MRR98Unverified