| MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Interactions | May 29, 2024 | BenchmarkingDialogue Understanding | CodeCode Available | 1 |
| Breaking Language Barriers in Multilingual Mathematical Reasoning: Insights and Observations | Oct 31, 2023 | GSM8KMath | CodeCode Available | 1 |
| Are Large Language Models Really Good Logical Reasoners? A Comprehensive Evaluation and Beyond | Jun 16, 2023 | BenchmarkingEvidence Selection | CodeCode Available | 1 |
| Benchmarking Multimodal Mathematical Reasoning with Explicit Visual Dependency | Apr 24, 2025 | BenchmarkingMath | CodeCode Available | 1 |
| A Reinforcement Learning Environment for Mathematical Reasoning via Program Synthesis | Jul 15, 2021 | Mathematical ReasoningProgram Synthesis | CodeCode Available | 1 |
| MAPO: Advancing Multilingual Reasoning through Multilingual Alignment-as-Preference Optimization | Jan 12, 2024 | Mathematical Reasoning | CodeCode Available | 1 |
| Boosting MLLM Reasoning with Text-Debiased Hint-GRPO | Mar 31, 2025 | Mathematical ReasoningMultimodal Reasoning | CodeCode Available | 1 |
| A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methods | Feb 3, 2025 | MathMathematical Reasoning | CodeCode Available | 1 |
| LogicVista: Multimodal LLM Logical Reasoning Benchmark in Visual Contexts | Jul 6, 2024 | Logical ReasoningMathematical Reasoning | CodeCode Available | 1 |
| Living in the Moment: Can Large Language Models Grasp Co-Temporal Reasoning? | Jun 13, 2024 | Mathematical ReasoningQuestion Answering | CodeCode Available | 1 |