| Scaling up Masked Diffusion Models on Text | Oct 24, 2024 | GSM8KLanguage Modeling | CodeCode Available | 3 |
| From Blind Solvers to Logical Thinkers: Benchmarking LLMs' Logical Integrity on Faulty Mathematical Problems | Oct 24, 2024 | BenchmarkingCommon Sense Reasoning | —Unverified | 0 |
| Unleashing Reasoning Capability of LLMs via Scalable Question Synthesis from Scratch | Oct 24, 2024 | MathMathematical Reasoning | CodeCode Available | 2 |
| MiLoRA: Efficient Mixture of Low-Rank Adaptation for Large Language Models Fine-tuning | Oct 23, 2024 | MathMixture-of-Experts | —Unverified | 0 |
| Non-myopic Generation of Language Models for Reasoning and Planning | Oct 22, 2024 | Computational EfficiencyLanguage Modelling | CodeCode Available | 1 |
| Polyak's Heavy Ball Method Achieves Accelerated Local Rate of Convergence under Polyak-Lojasiewicz Inequality | Oct 22, 2024 | Math | —Unverified | 0 |
| Math Neurosurgery: Isolating Language Models' Math Reasoning Abilities Using Only Forward Passes | Oct 22, 2024 | GSM8KLanguage Modeling | CodeCode Available | 1 |
| Optimizing Chain-of-Thought Reasoning: Tackling Arranging Bottleneck via Plan Augmentation | Oct 22, 2024 | GSM8KMath | —Unverified | 0 |
| Forewarned is Forearmed: Leveraging LLMs for Data Synthesis through Failure-Inducing Exploration | Oct 22, 2024 | Math | —Unverified | 0 |
| JMMMU: A Japanese Massive Multi-discipline Multimodal Understanding Benchmark for Culture-aware Evaluation | Oct 22, 2024 | Math | —Unverified | 0 |