| A Chain-of-Thought Prompting Approach with LLMs for Evaluating Students' Formative Assessment Responses in Science | Mar 21, 2024 | Active LearningMath | —Unverified | 0 |
| Can I understand what I create? Self-Knowledge Evaluation of Large Language Models | Jun 10, 2024 | Math | —Unverified | 0 |
| Can ChatGPT Defend its Belief in Truth? Evaluating LLM Reasoning via Debate | May 22, 2023 | BenchmarkingMath | —Unverified | 0 |
| A Practice of Post-Training on Llama-3 70B with Optimal Selection of Additional Language Mixture Ratio | Sep 10, 2024 | Emotional IntelligenceMath | —Unverified | 0 |
| Error Classification of Large Language Models on Math Word Problems: A Dynamically Adaptive Framework | Jan 26, 2025 | MathMathematical Reasoning | —Unverified | 0 |
| HelpSteer3: Human-Annotated Feedback and Edit Data to Empower Inference-Time Scaling in Open-Ended General-Domain Tasks | Mar 6, 2025 | ChatbotLogical Reasoning | —Unverified | 0 |
| AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning | May 22, 2025 | Mathreinforcement-learning | —Unverified | 0 |
| hep-th | Jun 27, 2018 | Binary ClassificationMath | —Unverified | 0 |
| Herald: A Natural Language Annotated Lean 4 Dataset | Oct 9, 2024 | MathMathematical Reasoning | —Unverified | 0 |
| EoRA: Training-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation | Oct 28, 2024 | ARCMath | —Unverified | 0 |