GSM8K

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 201–225 of 439 papers

Title	Date	Tasks	Status	Score
Scheherazade: Evaluating Chain-of-Thought Math Reasoning in LLMs with Chain-of-Problems	Sep 30, 2024	GSM8KMath	CodeCode Available	5
SEGO: Sequential Subgoal Optimization for Mathematical Problem-Solving	Oct 19, 2023	GSM8KMath	CodeCode Available	5
ShareLoRA: Parameter Efficient and Robust Large Language Model Fine-tuning via Shared Low-Rank Adaptation	Jun 16, 2024	Continual LearningGSM8K	CodeCode Available	5
SMART: Self-learning Meta-strategy Agent for Reasoning Tasks	Oct 21, 2024	GSM8KSelf-Learning	CodeCode Available	5
AlignedCoT: Prompting Large Language Models via Native-Speaking Demonstrations	Nov 22, 2023	Common Sense ReasoningGSM8K	CodeCode Available	5
Text-to-LoRA: Instant Transformer Adaption	Jun 6, 2025	ARCGSM8K	CodeCode Available	5
metabench -- A Sparse Benchmark to Measure General Ability in Large Language Models	Jul 4, 2024	ARCGSM8K	CodeCode Available	5
The Price of Format: Diversity Collapse in LLMs	May 25, 2025	DiversityGSM8K	CodeCode Available	5
TreeCut: A Synthetic Unanswerable Math Word Problem Dataset for LLM Hallucination Evaluation	Feb 19, 2025	Dataset GenerationGSM8K	CodeCode Available	5
TutorGym: A Testbed for Evaluating AI Agents as Tutors and Students	May 2, 2025	GSM8KIn-Context Learning	CodeCode Available	5
Upweighting Easy Samples in Fine-Tuning Mitigates Forgetting	Feb 5, 2025	GSM8KMath	CodeCode Available	5
VarBench: Robust Language Model Benchmarking Through Dynamic Variable Perturbation	Jun 25, 2024	ARCBenchmarking	CodeCode Available	5
Iterative Reasoning Preference Optimization	Apr 30, 2024	ARCGSM8K	—Unverified	0
Efficient Contextual LLM Cascades through Budget-Constrained Policy Learning	Apr 17, 2024	GSM8KNavigate	—Unverified	0
MALT: Improving Reasoning with Multi-Agent LLM Training	Dec 2, 2024	Common Sense ReasoningGSM8K	—Unverified	0
MAmmoTH2: Scaling Instructions from the Web	May 6, 2024	ChatbotGSM8K	—Unverified	0
Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist	Jul 11, 2024	GSM8KMath	—Unverified	0
Is your LLM trapped in a Mental Set? Investigative study on how mental sets affect the reasoning capabilities of LLMs	Jan 21, 2025	GSM8KIn-Context Learning	—Unverified	0
Interpretable Math Word Problem Solution Generation Via Step-by-step Planning	Jun 1, 2023	GSM8KLanguage Modeling	—Unverified	0
MathAttack: Attacking Large Language Models Towards Math Solving Ability	Sep 4, 2023	Adversarial AttackGSM8K	—Unverified	0
Integrating Arithmetic Learning Improves Mathematical Reasoning in Smaller Models	Feb 18, 2025	Data AugmentationGSM8K	—Unverified	0
Instance-adaptive Zero-shot Chain-of-Thought Prompting	Sep 30, 2024	GSM8KMath	—Unverified	0
MathDivide: Improved mathematical reasoning by large language models	May 12, 2024	GSM8KLogical Reasoning	—Unverified	0
TreeBoN: Enhancing Inference-Time Alignment with Speculative Tree-Search and Best-of-N Sampling	Oct 18, 2024	Computational EfficiencyGSM8K	—Unverified	0
MathFimer: Enhancing Mathematical Reasoning by Expanding Reasoning Steps through Fill-in-the-Middle Task	Feb 17, 2025	Code CompletionGSM8K	—Unverified	0

Show:10 25 50

← PrevPage 9 of 18Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Xolver	Accuracy	98.1	—	Unverified
2	Orange-mini	0-shot MRR	98	—	Unverified