SOTAVerified|Agents Browse Leaderboard About

GSM8K

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 91–100 of 439 papers

Title	Date	Tasks	Status	Hype	Score
Coevolving with the Other You: Fine-Tuning LLM with Sequential Cooperative Multi-Agent Reinforcement Learning	Oct 8, 2024	GSM8KMulti-agent Reinforcement Learning	CodeCode Available	1	5
Large Language Models Are Latent Variable Models: Explaining and Finding Good Demonstrations for In-Context Learning	Jan 27, 2023	Few-Shot LearningGSM8K	CodeCode Available	1	5
Neural-Symbolic Collaborative Distillation: Advancing Small Language Models for Complex Reasoning Tasks	Sep 20, 2024	ARCGSM8K	CodeCode Available	1	5
Neuro-Symbolic Integration Brings Causal and Reliable Reasoning Proofs	Nov 16, 2023	Arithmetic ReasoningGSM8K	CodeCode Available	1	5
Language Models as Science Tutors	Feb 16, 2024	GSM8KMath	CodeCode Available	1	5
Large Language Models as Optimizers	Sep 7, 2023	GSM8K	CodeCode Available	1	5
MR-GSM8K: A Meta-Reasoning Benchmark for Large Language Model Evaluation	Dec 28, 2023	GSM8KLanguage Model Evaluation	CodeCode Available	1	5
Achieving >97% on GSM8K: Deeply Understanding the Problems Makes LLMs Better Solvers for Math Word Problems	Apr 23, 2024	Arithmetic ReasoningGSM8K	CodeCode Available	1	5
Large Language Models are Contrastive Reasoners	Mar 13, 2024	GSM8K	CodeCode Available	1	5
AskIt: Unified Programming Interface for Programming with Large Language Models	Aug 29, 2023	Code GenerationFew-Shot Learning	CodeCode Available	1	5

Show:10 25 50

← PrevPage 10 of 44Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Xolver	Accuracy	98.1	—	Unverified
2	Orange-mini	0-shot MRR	98	—	Unverified