| Diversity of Thought Elicits Stronger Reasoning Capabilities in Multi-Agent Debate Frameworks | Oct 10, 2024 | 8kDiversity | —Unverified | 0 |
| Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models | Oct 10, 2024 | GSM8KMath | CodeCode Available | 2 |
| MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code | Oct 10, 2024 | MathMathematical Reasoning | CodeCode Available | 2 |
| Teaching-Inspired Integrated Prompting Framework: A Novel Approach for Enhancing Reasoning in Large Language Models | Oct 10, 2024 | Arithmetic ReasoningMath | CodeCode Available | 0 |
| VerifierQ: Enhancing LLM Test Time Compute with Q-Learning-based Verifiers | Oct 10, 2024 | Mathematical ReasoningQ-Learning | —Unverified | 0 |
| Herald: A Natural Language Annotated Lean 4 Dataset | Oct 9, 2024 | MathMathematical Reasoning | —Unverified | 0 |
| Towards Self-Improvement of LLMs via MCTS: Leveraging Stepwise Knowledge with Curriculum Preference Learning | Oct 9, 2024 | Mathematical Reasoning | —Unverified | 0 |
| PositionID: LLMs can Control Lengths, Copy and Paste with Explicit Positional Awareness | Oct 9, 2024 | Mathematical Reasoning | —Unverified | 0 |
| Subtle Errors Matter: Preference Learning via Error-injected Self-editing | Oct 9, 2024 | GSM8KMath | —Unverified | 0 |
| FG-PRM: Fine-grained Hallucination Detection and Mitigation in Language Model Mathematical Reasoning | Oct 8, 2024 | GSM8KHallucination | —Unverified | 0 |