| LLM Performance for Code Generation on Noisy Tasks | May 29, 2025 | BenchmarkingCode Generation | CodeCode Available | 0 | 5 |
| FINNger -- Applying artificial intelligence to ease math learning for children | May 26, 2021 | Hand Pose EstimationMath | CodeCode Available | 0 | 5 |
| ChatBench: From Static Benchmarks to Human-AI Evaluation | Mar 22, 2025 | MathMMLU | CodeCode Available | 0 | 5 |
| Semantically-Aligned Equation Generation for Solving and Reasoning Math Word Problems | Nov 2, 2018 | DecoderMath | CodeCode Available | 0 | 5 |
| AIFB-WebScience at SemEval-2022 Task 12: Relation Extraction First - Using Relation Extraction to Identify Entities | Jul 1, 2022 | Joint Entity and Relation ExtractionMath | CodeCode Available | 0 | 5 |
| Fill in the Blank: Exploring and Enhancing LLM Capabilities for Backward Reasoning in Math Word Problems | Oct 3, 2023 | GSM8KMath | CodeCode Available | 0 | 5 |
| Linguistic Generalizability of Test-Time Scaling in Mathematical Reasoning | Feb 24, 2025 | MathMathematical Reasoning | CodeCode Available | 0 | 5 |
| Challenging the Boundaries of Reasoning: An Olympiad-Level Math Benchmark for Large Language Models | Mar 27, 2025 | Data VisualizationMath | CodeCode Available | 0 | 5 |
| Library Learning Doesn't: The Curious Case of the Single-Use "Library" | Oct 26, 2024 | MathMathematical Reasoning | CodeCode Available | 0 | 5 |
| ArithmAttack: Evaluating Robustness of LLMs to Noisy Context in Math Problem Solving | Jan 14, 2025 | GSM8KMath | CodeCode Available | 0 | 5 |
| AIFB-WebScience at SemEval-2022 Task 12: Relation Extraction First -- Using Relation Extraction to Identify Entities | Mar 10, 2022 | Joint Entity and Relation ExtractionMath | CodeCode Available | 0 | 5 |
| Faithful Chain-of-Thought Reasoning | Jan 31, 2023 | MathMulti-hop Question Answering | CodeCode Available | 0 | 5 |
| CER: Confidence Enhanced Reasoning in LLMs | Feb 20, 2025 | MathMathematical Reasoning | CodeCode Available | 0 | 5 |
| Leveraging Label Semantics and Meta-Label Refinement for Multi-Label Question Classification | Nov 4, 2024 | MathReranking | CodeCode Available | 0 | 5 |
| Leveraging Training Data in Few-Shot Prompting for Numerical Reasoning | May 29, 2023 | Language ModellingLarge Language Model | CodeCode Available | 0 | 5 |
| Learning to Solve Geometry Problems via Simulating Human Dual-Reasoning Process | May 10, 2024 | Geometry Problem SolvingMachine Translation | CodeCode Available | 0 | 5 |
| A Diversity-Enhanced Knowledge Distillation Model for Practical Math Word Problem Solving | Jan 7, 2025 | DiversityKnowledge Distillation | CodeCode Available | 0 | 5 |
| Leveraging Web-Crawled Data for High-Quality Fine-Tuning | Aug 15, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 | 5 |
| Exploring Automated Distractor Generation for Math Multiple-choice Questions via Large Language Models | Apr 2, 2024 | Distractor GenerationIn-Context Learning | CodeCode Available | 0 | 5 |
| Solving Arithmetic Word Problems Automatically Using Transformer and Unambiguous Representations | Dec 2, 2019 | Math | CodeCode Available | 0 | 5 |
| Automated Distractor and Feedback Generation for Math Multiple-choice Questions via In-context Learning | Aug 7, 2023 | In-Context LearningMath | CodeCode Available | 0 | 5 |
| Learning Decentralized Swarms Using Rotation Equivariant Graph Neural Networks | Feb 24, 2025 | Graph Neural NetworkMath | CodeCode Available | 0 | 5 |
| Can We Use Small Models to Investigate Multimodal Fusion Methods? | Sep 1, 2022 | Math | CodeCode Available | 0 | 5 |
| Learning a Continue-Thinking Token for Enhanced Test-Time Scaling | Jun 12, 2025 | GSM8KMath | CodeCode Available | 0 | 5 |
| Can Vision-Language Models Evaluate Handwritten Math? | Jan 13, 2025 | Math | CodeCode Available | 0 | 5 |