| ErrorRadar: Benchmarking Complex Mathematical Reasoning of Multimodal Large Language Models Via Error Detection | Oct 6, 2024 | BenchmarkingMathematical Reasoning | —Unverified | 0 |
| Evaluating Grounded Reasoning by Code-Assisted Large Language Models for Mathematics | Apr 24, 2025 | Code GenerationMath | —Unverified | 0 |
| Evaluating Large Vision-and-Language Models on Children's Mathematical Olympiads | Jun 22, 2024 | Mathematical Reasoning | —Unverified | 0 |
| Evaluating Robustness of Reward Models for Mathematical Reasoning | Oct 2, 2024 | MathMathematical Reasoning | —Unverified | 0 |
| Evaluating the Meta- and Object-Level Reasoning of Large Language Models for Question Answering | Feb 14, 2025 | Mathematical ReasoningObject | —Unverified | 0 |
| Evaluation of LLMs for mathematical problem solving | May 30, 2025 | GSM8KMathematical Problem-Solving | —Unverified | 0 |
| Evaluation of OpenAI o1: Opportunities and Challenges of AGI | Sep 27, 2024 | Emotion RecognitionLarge Language Model | —Unverified | 0 |
| Evolutionary Pre-Prompt Optimization for Mathematical Reasoning | Dec 5, 2024 | Few-Shot LearningGSM8K | —Unverified | 0 |
| Evolving LLMs' Self-Refinement Capability via Iterative Preference Optimization | Feb 8, 2025 | GSM8KMath | —Unverified | 0 |
| Crossing the Reward Bridge: Expanding RL with Verifiable Rewards Across Diverse Domains | Mar 31, 2025 | Mathematical Reasoningreinforcement-learning | —Unverified | 0 |