| Pairwise RM: Perform Best-of-N Sampling with Knockout Tournament | Jan 22, 2025 | Math | CodeCode Available | 1 | 5 |
| ChatCoT: Tool-Augmented Chain-of-Thought Reasoning on Chat-based Large Language Models | May 23, 2023 | Math | CodeCode Available | 1 | 5 |
| ArMATH: a Dataset for Solving Arabic Math Word Problems | Jun 1, 2022 | Deep LearningMath | CodeCode Available | 1 | 5 |
| Agent-X: Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic Tasks | May 30, 2025 | Autonomous DrivingMath | CodeCode Available | 1 | 5 |
| Broken Neural Scaling Laws | Oct 26, 2022 | Adversarial RobustnessContinual Learning | CodeCode Available | 1 | 5 |
| Arithmetic Without Algorithms: Language Models Solve Math With a Bag of Heuristics | Oct 28, 2024 | Arithmetic ReasoningMath | CodeCode Available | 1 | 5 |
| Building Dataset for Grounding of Formulae — Annotating Coreference Relations Among Math Identifiers | Jun 1, 2022 | Math | CodeCode Available | 1 | 5 |
| Evolving Prompts In-Context: An Open-ended, Self-replicating Perspective | Jun 22, 2025 | In-Context LearningLarge Language Model | CodeCode Available | 1 | 5 |
| Pretrained Language Models are Symbolic Mathematics Solvers too! | Oct 7, 2021 | IngenuityLanguage Modelling | CodeCode Available | 1 | 5 |
| Open Eyes, Then Reason: Fine-grained Visual Mathematical Understanding in MLLMs | Jan 11, 2025 | MathMathematical Problem-Solving | CodeCode Available | 1 | 5 |