GSM8K

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 201–225 of 439 papers

Title	Date	Tasks	Status	Hype
Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models	Oct 10, 2024	GSM8KMath	CodeCode Available	2
Dialectical Behavior Therapy Approach to LLM Prompting	Oct 10, 2024	GSM8KStrategyQA	—Unverified	0
Think Beyond Size: Adaptive Prompting for More Effective Reasoning	Oct 10, 2024	Arithmetic ReasoningComputational Efficiency	—Unverified	0
Subtle Errors Matter: Preference Learning via Error-injected Self-editing	Oct 9, 2024	GSM8KMath	—Unverified	0
PortLLM: Personalizing Evolving Large Language Models with Training-Free and Portable Model Patches	Oct 8, 2024	GPUGSM8K	—Unverified	0
Coevolving with the Other You: Fine-Tuning LLM with Sequential Cooperative Multi-Agent Reinforcement Learning	Oct 8, 2024	GSM8KMulti-agent Reinforcement Learning	CodeCode Available	1
FG-PRM: Fine-grained Hallucination Detection and Mitigation in Language Model Mathematical Reasoning	Oct 8, 2024	GSM8KHallucination	—Unverified	0
Reasoning Paths Optimization: Learning to Reason and Explore From Diverse Paths	Oct 7, 2024	AttributeGSM8K	—Unverified	0
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models	Oct 7, 2024	GSM8KLogical Reasoning	CodeCode Available	1
Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification	Oct 5, 2024	GSM8KMath	—Unverified	0
LLM-TOPLA: Efficient LLM Ensemble by Maximising Diversity	Oct 4, 2024	DiversityEnsemble Pruning	CodeCode Available	0
BrainTransformers: SNN-LLM	Oct 3, 2024	ARCGSM8K	—Unverified	0
Unlocking Structured Thinking in Language Models with Cognitive Prompting	Oct 3, 2024	Arithmetic ReasoningGSM8K	—Unverified	0
CodePMP: Scalable Preference Model Pretraining for Large Language Model Reasoning	Oct 3, 2024	GSM8KLanguage Modeling	—Unverified	0
The Role of Deductive and Inductive Reasoning in Large Language Models	Oct 3, 2024	GSM8K	—Unverified	0
Adaptive Inference-Time Compute: LLMs Can Predict if They Can Do Better, Even Mid-Generation	Oct 3, 2024	GSM8KMath	—Unverified	0
PersonaMath: Enhancing Math Reasoning through Persona-Driven Data Augmentation	Oct 2, 2024	Data AugmentationDiversity	—Unverified	0
VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment	Oct 2, 2024	GSM8KMath	CodeCode Available	2
Scheherazade: Evaluating Chain-of-Thought Math Reasoning in LLMs with Chain-of-Problems	Sep 30, 2024	GSM8KMath	CodeCode Available	0
Instance-adaptive Zero-shot Chain-of-Thought Prompting	Sep 30, 2024	GSM8KMath	—Unverified	0
PMSS: Pretrained Matrices Skeleton Selection for LLM Fine-tuning	Sep 25, 2024	GSM8KMath	—Unverified	0
LLaMa-SciQ: An Educational Chatbot for Answering Science MCQ	Sep 25, 2024	ChatbotGSM8K	—Unverified	0
Uncovering Latent Chain of Thought Vectors in Language Models	Sep 21, 2024	ARCGSM8K	—Unverified	0
Neural-Symbolic Collaborative Distillation: Advancing Small Language Models for Complex Reasoning Tasks	Sep 20, 2024	ARCGSM8K	CodeCode Available	1
ControlMath: Controllable Data Generation Promotes Math Generalist Models	Sep 20, 2024	Data AugmentationDiversity	—Unverified	0

Show:10 25 50

← PrevPage 9 of 18Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Xolver	Accuracy	98.1	—	Unverified
2	Orange-mini	0-shot MRR	98	—	Unverified