GSM8K

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 226–250 of 439 papers

Title	Date	Tasks	Status
SOLAR: Scalable Optimization of Large-scale Architecture for Reasoning	Mar 6, 2025	GSM8KMath	—Unverified
DeLTa: A Decoding Strategy based on Logit Trajectory Prediction Improves Factuality and Reasoning Ability	Mar 4, 2025	GSM8KLogical Reasoning	CodeCode Available
Self-Evolved Preference Optimization for Enhancing Mathematical Reasoning in Small Language Models	Mar 4, 2025	GSM8KMath	—Unverified
CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation	Feb 28, 2025	GSM8K	CodeCode Available
Layer-Aware Task Arithmetic: Disentangling Task-Specific and Instruction-Following Knowledge	Feb 27, 2025	GSM8KHumanEval	—Unverified
Weaker LLMs' Opinions Also Matter: Mixture of Opinions Enhances LLM's Mathematical Reasoning	Feb 26, 2025	GSM8KMathematical Reasoning	—Unverified
Distill Not Only Data but Also Rewards: Can Smaller Language Models Surpass Larger Ones?	Feb 26, 2025	GSM8KMMLU	—Unverified
SECURA: Sigmoid-Enhanced CUR Decomposition with Uninterrupted Retention and Low-Rank Adaptation in Large Language Models	Feb 25, 2025	Continual LearningGSM8K	—Unverified
LED-Merging: Mitigating Safety-Utility Conflicts in Model Merging with Location-Election-Disjoint	Feb 24, 2025	GSM8K	—Unverified
Dynamic Parallel Tree Search for Efficient LLM Reasoning	Feb 22, 2025	Computational EfficiencyGSM8K	—Unverified
Earlier Tokens Contribute More: Learning Direct Preference Optimization From Temporal Decay Perspective	Feb 20, 2025	GSM8KMath	CodeCode Available
NLoRA: Nyström-Initiated Low-Rank Adaptation for Large Language Models	Feb 20, 2025	GSM8KNatural Language Understanding	CodeCode Available
From Correctness to Comprehension: AI Agents for Personalized Error Diagnosis in Education	Feb 19, 2025	DiagnosticGSM8K	—Unverified
TreeCut: A Synthetic Unanswerable Math Word Problem Dataset for LLM Hallucination Evaluation	Feb 19, 2025	Dataset GenerationGSM8K	CodeCode Available
Integrating Arithmetic Learning Improves Mathematical Reasoning in Smaller Models	Feb 18, 2025	Data AugmentationGSM8K	—Unverified
MathFimer: Enhancing Mathematical Reasoning by Expanding Reasoning Steps through Fill-in-the-Middle Task	Feb 17, 2025	Code CompletionGSM8K	—Unverified
Leveraging Uncertainty Estimation for Efficient LLM Routing	Feb 16, 2025	GSM8KMMLU	—Unverified
Balancing the Budget: Understanding Trade-offs Between Supervised and Preference-Based Finetuning	Feb 16, 2025	GSM8K	—Unverified
Uncertainty-Aware Search and Value Models: Mitigating Search Scaling Flaws in LLMs	Feb 16, 2025	GSM8KThompson Sampling	—Unverified
Don't Get Lost in the Trees: Streamlining LLM Reasoning by Overcoming Tree Search Exploration Pitfalls	Feb 16, 2025	Computational EfficiencyGSM8K	CodeCode Available
Hybrid Offline-online Scheduling Method for Large Language Model Inference Optimization	Feb 14, 2025	GSM8KInference Optimization	—Unverified
Cost-Saving LLM Cascades with Early Abstention	Feb 13, 2025	GSM8KMMLU	—Unverified
Mathematical Reasoning in Large Language Models: Assessing Logical and Arithmetic Errors across Wide Numerical Ranges	Feb 12, 2025	GSM8KMath	CodeCode Available
Self-Training Large Language Models for Tool-Use Without Demonstrations	Feb 9, 2025	GSM8KMathematical Reasoning	—Unverified
Evolving LLMs' Self-Refinement Capability via Iterative Preference Optimization	Feb 8, 2025	GSM8KMath	—Unverified

Show:10 25 50

← PrevPage 10 of 18Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Xolver	Accuracy	98.1	—	Unverified
2	Orange-mini	0-shot MRR	98	—	Unverified