| Large (Vision) Language Models are Unsupervised In-Context Learners | Apr 3, 2025 | GSM8KIn-Context Learning | CodeCode Available | 1 |
| Entropy-Based Adaptive Weighting for Self-Training | Mar 31, 2025 | GSM8KMath | CodeCode Available | 1 |
| SafeMERGE: Preserving Safety Alignment in Fine-Tuned Large Language Models via Selective Layer-Wise Model Merging | Mar 21, 2025 | GSM8KSafety Alignment | CodeCode Available | 1 |
| Towards Hierarchical Multi-Step Reward Models for Enhanced Reasoning in Large Language Models | Mar 16, 2025 | Data AugmentationGSM8K | CodeCode Available | 1 |
| PromptCoT: Synthesizing Olympiad-level Problems for Mathematical Reasoning in Large Language Models | Mar 4, 2025 | GSM8KMath | CodeCode Available | 1 |
| FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle Solving | Feb 27, 2025 | GSM8KMath | CodeCode Available | 1 |
| Self-Training Elicits Concise Reasoning in Large Language Models | Feb 27, 2025 | GSM8KIn-Context Learning | CodeCode Available | 1 |
| SMART: Self-Aware Agent for Tool Overuse Mitigation | Feb 17, 2025 | GSM8KLarge Language Model | CodeCode Available | 1 |
| MyGO Multiplex CoT: A Method for Self-Reflection in Large Language Models via Double Chain of Thought Thinking | Jan 20, 2025 | Decision MakingGSM8K | CodeCode Available | 1 |
| Entropy-Regularized Process Reward Model | Dec 15, 2024 | GSM8KMath | CodeCode Available | 1 |