| Cascaded Self-Evaluation Augmented Training for Efficient Multimodal Large Language Models | Jan 10, 2025 | Math | —Unverified | 0 | 0 |
| Can you hear me now? Sensitive comparisons of human and machine perception | Mar 27, 2020 | Mathspeech-recognition | —Unverified | 0 | 0 |
| Dualformer: Controllable Fast and Slow Thinking by Learning with Randomized Reasoning Traces | Oct 13, 2024 | Computational EfficiencyMath | —Unverified | 0 | 0 |
| DynaMath: A Dynamic Visual Benchmark for Evaluating Mathematical Reasoning Robustness of Vision Language Models | Oct 29, 2024 | MathMathematical Reasoning | —Unverified | 0 | 0 |
| Can We Further Elicit Reasoning in LLMs? Critic-Guided Planning with Retrieval-Augmentation for Solving Challenging Tasks | Oct 2, 2024 | MathNavigate | —Unverified | 0 | 0 |
| Testing GPT-4-o1-preview on math and science problems: A follow-up study | Oct 11, 2024 | MathSpatial Reasoning | —Unverified | 0 | 0 |
| Dynamic Scheduling of MPI-based Distributed Deep Learning Training Jobs | Aug 21, 2019 | Deep LearningMath | —Unverified | 0 | 0 |
| Dynamic Skill Adaptation for Large Language Models | Dec 26, 2024 | Math | —Unverified | 0 | 0 |
| Testing GPT-4 with Wolfram Alpha and Code Interpreter plug-ins on math and science problems | Aug 10, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| EasyMath: A 0-shot Math Benchmark for SLMs | May 20, 2025 | Math | —Unverified | 0 | 0 |