GSM8K

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 226–250 of 439 papers

Title	Date	Tasks	Status	Hype
Balancing LoRA Performance and Efficiency with Simple Shard Sharing	Sep 19, 2024	Computational EfficiencyGSM8K	CodeCode Available	2
LogicPro: Improving Complex Logical Reasoning via Program-Guided Learning	Sep 19, 2024	GSM8KLogical Reasoning	CodeCode Available	0
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement	Sep 18, 2024	GSM8KMath	—Unverified	0
Improving LLM Reasoning with Multi-Agent Tree-of-Thought Validator Agent	Sep 17, 2024	GSM8KQuestion Answering	CodeCode Available	1
CPL: Critical Plan Step Learning Boosts LLM Generalization in Reasoning Tasks	Sep 13, 2024	ARCCode Generation	—Unverified	0
STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning	Sep 10, 2024	GSM8KMixture-of-Experts	—Unverified	0
Strategic Chain-of-Thought: Guiding Accurate Reasoning in LLMs through Strategy Elicitation	Sep 5, 2024	GSM8K	—Unverified	0
Prompt Baking	Sep 4, 2024	ARCGSM8K	—Unverified	0
CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal Models	Sep 4, 2024	GSM8KMath	CodeCode Available	2
Building Math Agents with Multi-Turn Iterative Preference Learning	Sep 4, 2024	GSM8KMath	—Unverified	0
S^3c-Math: Spontaneous Step-level Self-correction Makes Large Language Models Better Mathematical Reasoners	Sep 3, 2024	GSM8KMath	—Unverified	0
Logic Contrastive Reasoning with Lightweight Large Language Model for Math Word Problems	Aug 29, 2024	GSM8KLanguage Modeling	—Unverified	0
Critic-CoT: Boosting the reasoning abilities of large language model via Chain-of-thoughts Critic	Aug 29, 2024	GSM8KLanguage Modeling	—Unverified	0
SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models	Aug 28, 2024	Data AugmentationGSM8K	—Unverified	0
SORSA: Singular Values and Orthonormal Regularized Singular Vectors Adaptation of Large Language Models	Aug 21, 2024	8kGSM8K	CodeCode Available	1
Threshold Filtering Packing for Supervised Fine-Tuning: Training Related Samples within Packs	Aug 18, 2024	DiversityGPU	—Unverified	0
SelectLLM: Query-Aware Efficient Selection Algorithm for Large Language Models	Aug 16, 2024	GSM8KMMLU	—Unverified	0
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers	Aug 12, 2024	GSM8KMath	CodeCode Available	4
Mathfish: Evaluating Language Model Math Reasoning via Grounding in Educational Curricula	Aug 8, 2024	GSM8KLanguage Modeling	CodeCode Available	1
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling	Jul 31, 2024	GSM8KMath	CodeCode Available	3
Cool-Fusion: Fuse Large Language Models without Training	Jul 29, 2024	Combinatorial OptimizationGSM8K	—Unverified	0
Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process	Jul 29, 2024	GSM8KMath	CodeCode Available	2
Concise Thoughts: Impact of Output Length on LLM Reasoning and Cost	Jul 29, 2024	GSM8KPrompt Engineering	—Unverified	0
Learning Goal-Conditioned Representations for Language Reward Models	Jul 18, 2024	GSM8KMath	CodeCode Available	1
Weak-to-Strong Reasoning	Jul 18, 2024	GSM8KMath	CodeCode Available	2

Show:10 25 50

← PrevPage 10 of 18Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Xolver	Accuracy	98.1	—	Unverified
2	Orange-mini	0-shot MRR	98	—	Unverified