| Can AI Assistants Know What They Don't Know? | Jan 24, 2024 | MathOpen-Domain Question Answering | CodeCode Available | 2 | 5 |
| MAS-Zero: Designing Multi-Agent Systems with Zero Supervision | May 26, 2025 | MathProblem Decomposition | CodeCode Available | 2 | 5 |
| Efficient Reinforcement Finetuning via Adaptive Curriculum Learning | Apr 7, 2025 | MathMathematical Reasoning | CodeCode Available | 2 | 5 |
| MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code | Oct 10, 2024 | MathMathematical Reasoning | CodeCode Available | 2 | 5 |
| Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision | Mar 14, 2024 | MathReinforcement Learning (RL) | CodeCode Available | 2 | 5 |
| Evaluating Mathematical Reasoning Beyond Accuracy | Apr 8, 2024 | MathMathematical Reasoning | CodeCode Available | 2 | 5 |
| MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts | Oct 3, 2023 | ChatbotImage Captioning | CodeCode Available | 2 | 5 |
| Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models | May 24, 2024 | Common Sense ReasoningLanguage Modelling | CodeCode Available | 2 | 5 |
| Nexus: A Lightweight and Scalable Multi-Agent Framework for Complex Tasks Automation | Feb 26, 2025 | Code GenerationHumanEval | CodeCode Available | 2 | 5 |
| Dynamic Early Exit in Reasoning Models | Apr 22, 2025 | GSM8KMath | CodeCode Available | 2 | 5 |