| Chain-of-Reasoning: Towards Unified Mathematical Reasoning in Large Language Models via a Multi-Paradigm Perspective | Jan 19, 2025 | Automated Theorem ProvingMath | —Unverified | 0 |
| Language Representation Favored Zero-Shot Cross-Domain Cognitive Diagnosis | Jan 18, 2025 | cognitive diagnosisMath | CodeCode Available | 0 |
| Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback | Jan 18, 2025 | MathMathematical Reasoning | —Unverified | 0 |
| Iterative Label Refinement Matters More than Preference Optimization under Weak Supervision | Jan 14, 2025 | Instruction FollowingMath | CodeCode Available | 0 |
| ArithmAttack: Evaluating Robustness of LLMs to Noisy Context in Math Problem Solving | Jan 14, 2025 | GSM8KMath | CodeCode Available | 0 |
| Can Vision-Language Models Evaluate Handwritten Math? | Jan 13, 2025 | Math | CodeCode Available | 0 |
| ZNO-Eval: Benchmarking reasoning capabilities of large language models in Ukrainian | Jan 12, 2025 | BenchmarkingMath | CodeCode Available | 1 |
| Open Eyes, Then Reason: Fine-grained Visual Mathematical Understanding in MLLMs | Jan 11, 2025 | MathMathematical Problem-Solving | CodeCode Available | 1 |
| Cascaded Self-Evaluation Augmented Training for Efficient Multimodal Large Language Models | Jan 10, 2025 | Math | —Unverified | 0 |
| A General Retrieval-Augmented Generation Framework for Multimodal Case-Based Reasoning Applications | Jan 9, 2025 | MathRAG | —Unverified | 0 |