| Integrating External Tools with Large Language Models to Improve Accuracy | Jul 9, 2025 | Mathematical ReasoningMMLU | —Unverified | 0 |
| InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model | Jan 21, 2025 | Instruction FollowingMathematical Reasoning | —Unverified | 0 |
| Investigating the Effectiveness of ChatGPT in Mathematical Reasoning and Problem Solving: Evidence from the Vietnamese National High School Graduation Examination | Jun 10, 2023 | MathMathematical Reasoning | —Unverified | 0 |
| Investigating the interaction of linguistic and mathematical reasoning in language models using multilingual number puzzles | Jun 16, 2025 | DiversityMathematical Reasoning | —Unverified | 0 |
| Investigating the Potential of Large Language Model-Based Router Multi-Agent Architectures for Foundation Design Automation: A Task Classification and Expert Selection Study | Jun 13, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| IrokoBench: A New Benchmark for African Languages in the Age of Large Language Models | Jun 5, 2024 | Mathematical ReasoningNatural Language Inference | —Unverified | 0 |
| Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist | Jul 11, 2024 | GSM8KMath | —Unverified | 0 |
| iTBLS: A Dataset of Interactive Conversations Over Tabular Information | Apr 19, 2024 | ArticlesMathematical Reasoning | —Unverified | 0 |
| JiuZhang 2.0: A Unified Chinese Pre-trained Language Model for Multi-task Mathematical Problem Solving | Jun 19, 2023 | In-Context LearningLanguage Modeling | —Unverified | 0 |
| Keep Guessing? When Considering Inference Scaling, Mind the Baselines | Oct 20, 2024 | Mathematical Reasoning | —Unverified | 0 |