| Lynx: Enabling Efficient MoE Inference through Dynamic Batch-Aware Expert Selection | Nov 13, 2024 | Code GenerationMathematical Reasoning | —Unverified | 0 |
| Gap-Filling Prompting Enhances Code-Assisted Mathematical Reasoning | Nov 8, 2024 | Mathematical Reasoning | CodeCode Available | 0 |
| Benchmarking Large Language Models with Integer Sequence Generation Tasks | Nov 7, 2024 | BenchmarkingComputational Efficiency | —Unverified | 0 |
| Kwai-STaR: Transform LLMs into State-Transition Reasoners | Nov 7, 2024 | GSM8KMathematical Problem-Solving | —Unverified | 0 |
| FrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI | Nov 7, 2024 | Mathematical Reasoning | —Unverified | 0 |
| MoD: A Distribution-Based Approach for Merging Large Language Models | Nov 1, 2024 | Mathematical Reasoning | CodeCode Available | 0 |
| STEM-POM: Evaluating Language Models Math-Symbol Reasoning in Document Parsing | Nov 1, 2024 | 2kIn-Context Learning | —Unverified | 0 |
| VisAidMath: Benchmarking Visual-Aided Mathematical Reasoning | Oct 30, 2024 | BenchmarkingHallucination | —Unverified | 0 |
| Flow-DPO: Improving LLM Mathematical Reasoning through Online Multi-Agent Learning | Oct 29, 2024 | Mathematical Reasoning | —Unverified | 0 |
| DynaMath: A Dynamic Visual Benchmark for Evaluating Mathematical Reasoning Robustness of Vision Language Models | Oct 29, 2024 | MathMathematical Reasoning | —Unverified | 0 |