| Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models | Oct 10, 2024 | GSM8KMath | CodeCode Available | 2 |
| Think Beyond Size: Adaptive Prompting for More Effective Reasoning | Oct 10, 2024 | Arithmetic ReasoningComputational Efficiency | —Unverified | 0 |
| Dialectical Behavior Therapy Approach to LLM Prompting | Oct 10, 2024 | GSM8KStrategyQA | —Unverified | 0 |
| Subtle Errors Matter: Preference Learning via Error-injected Self-editing | Oct 9, 2024 | GSM8KMath | —Unverified | 0 |
| PortLLM: Personalizing Evolving Large Language Models with Training-Free and Portable Model Patches | Oct 8, 2024 | GPUGSM8K | —Unverified | 0 |
| Coevolving with the Other You: Fine-Tuning LLM with Sequential Cooperative Multi-Agent Reinforcement Learning | Oct 8, 2024 | GSM8KMulti-agent Reinforcement Learning | CodeCode Available | 1 |
| FG-PRM: Fine-grained Hallucination Detection and Mitigation in Language Model Mathematical Reasoning | Oct 8, 2024 | GSM8KHallucination | —Unverified | 0 |
| Reasoning Paths Optimization: Learning to Reason and Explore From Diverse Paths | Oct 7, 2024 | AttributeGSM8K | —Unverified | 0 |
| GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models | Oct 7, 2024 | GSM8KLogical Reasoning | CodeCode Available | 1 |
| Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification | Oct 5, 2024 | GSM8KMath | —Unverified | 0 |