GSM8K

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 276–300 of 439 papers

Title	Date	Tasks	Status	Hype
ShareLoRA: Parameter Efficient and Robust Large Language Model Fine-tuning via Shared Low-Rank Adaptation	Jun 16, 2024	Continual LearningGSM8K	CodeCode Available	0
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B	Jun 11, 2024	Decision MakingGSM8K	CodeCode Available	5
Uncertainty Aware Learning for Language Model Alignment	Jun 7, 2024	GSM8KLanguage Modeling	—Unverified	0
Improve Mathematical Reasoning in Language Models by Automated Process Supervision	Jun 5, 2024	GSM8KMath	—Unverified	0
Does your data spark joy? Performance gains from domain upsampling at the end of training	Jun 5, 2024	GSM8KHumanEval	—Unverified	0
Automatic Instruction Evolving for Large Language Models	Jun 2, 2024	GSM8KHumanEval	CodeCode Available	3
GKT: A Novel Guidance-Based Knowledge Transfer Framework For Efficient Cloud-edge Collaboration LLM Deployment	May 30, 2024	GSM8KKnowledge Distillation	CodeCode Available	0
SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths	May 30, 2024	GSM8KHumanEval	—Unverified	0
Arithmetic Reasoning with LLM: Prolog Generation & Permutation	May 28, 2024	Arithmetic ReasoningData Augmentation	—Unverified	0
LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters	May 27, 2024	BenchmarkingGSM8K	CodeCode Available	2
Multi-Reference Preference Optimization for Large Language Models	May 26, 2024	GSM8KTruthfulQA	—Unverified	0
MindStar: Enhancing Math Reasoning in Pre-trained LLMs at Inference Time	May 25, 2024	GSM8KMath	—Unverified	0
Revisiting MoE and Dense Speed-Accuracy Comparisons for LLM Training	May 23, 2024	GSM8KMixture-of-Experts	CodeCode Available	7
ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification	May 23, 2024	GPUGSM8K	CodeCode Available	1
Unchosen Experts Can Contribute Too: Unleashing MoE Models' Power by Self-Contrast	May 23, 2024	Computational EfficiencyGSM8K	CodeCode Available	1
From Explicit CoT to Implicit CoT: Learning to Internalize CoT Step by Step	May 23, 2024	GSM8K	CodeCode Available	3
Multiple-Choice Questions are Efficient and Robust LLM Evaluators	May 20, 2024	GSM8KHumanEval	CodeCode Available	1
MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark	May 20, 2024	College MathematicsGSM8K	CodeCode Available	2
Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving	May 20, 2024	GSM8KMath	—Unverified	0
Meaning-Typed Programming: Language Abstraction and Runtime for Model-Integrated Applications	May 14, 2024	GSM8KMath	—Unverified	0
MuMath-Code: Combining Tool-Use Large Language Models with Multi-perspective Data Augmentation for Mathematical Reasoning	May 13, 2024	Data AugmentationGSM8K	CodeCode Available	3
MathDivide: Improved mathematical reasoning by large language models	May 12, 2024	GSM8KLogical Reasoning	—Unverified	0
MAmmoTH2: Scaling Instructions from the Web	May 6, 2024	ChatbotGSM8K	—Unverified	0
Exploring the Compositional Deficiency of Large Language Models in Mathematical Reasoning	May 5, 2024	GSM8KMath	CodeCode Available	2
Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning	May 1, 2024	ARCGSM8K	CodeCode Available	3

Show:10 25 50

← PrevPage 12 of 18Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Xolver	Accuracy	98.1	—	Unverified
2	Orange-mini	0-shot MRR	98	—	Unverified