| MathPhys-Guided Coarse-to-Fine Anomaly Synthesis with SQE-Driven Bi-Level Optimization for Anomaly Detection | Apr 17, 2025 | Anomaly DetectionData Augmentation | —Unverified | 0 |
| Rethinking the Generation of High-Quality CoT Data from the Perspective of LLM-Adaptive Question Difficulty Grading | Apr 16, 2025 | 2kCode Generation | —Unverified | 0 |
| Entropy-Guided Watermarking for LLMs: A Test-Time Framework for Robust and Traceable Text Generation | Apr 16, 2025 | GSM8KMath | —Unverified | 0 |
| Reinforcement Learning from Human Feedback | Apr 16, 2025 | MathPhilosophy | CodeCode Available | 5 |
| ReTool: Reinforcement Learning for Strategic Tool Use in LLMs | Apr 15, 2025 | MathMathematical Reasoning | —Unverified | 0 |
| Fine-Tuning Large Language Models on Quantum Optimization Problems for Circuit Generation | Apr 15, 2025 | MathQuantum Machine Learning | CodeCode Available | 1 |
| Heimdall: test-time scaling on the generative verification | Apr 14, 2025 | Math | —Unverified | 0 |
| M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models | Apr 14, 2025 | MambaMath | CodeCode Available | 1 |
| Efficient Process Reward Model Training via Active Learning | Apr 14, 2025 | Active LearningMath | CodeCode Available | 1 |
| The Jailbreak Tax: How Useful are Your Jailbreak Outputs? | Apr 14, 2025 | Math | CodeCode Available | 1 |
| Syzygy of Thoughts: Improving LLM CoT with the Minimal Free Resolution | Apr 13, 2025 | GSM8KMath | CodeCode Available | 3 |
| VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning | Apr 10, 2025 | MathMultimodal Reasoning | CodeCode Available | 2 |
| Dynamic Cheatsheet: Test-Time Learning with Adaptive Memory | Apr 10, 2025 | MathMMLU | CodeCode Available | 3 |
| Supervised Optimism Correction: Be Confident When LLMs Are Sure | Apr 10, 2025 | GSM8KMath | —Unverified | 0 |
| Task-Circuit Quantization: Leveraging Knowledge Localization and Interpretability for Compression | Apr 10, 2025 | MathMMLU | CodeCode Available | 1 |
| GPT Carry-On: Training Foundation Model for Customization Could Be Simple, Scalable and Affordable | Apr 10, 2025 | GPUMath | —Unverified | 0 |
| MDIT: A Model-free Data Interpolation Method for Diverse Instruction Tuning | Apr 9, 2025 | Code GenerationDiversity | —Unverified | 0 |
| MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models | Apr 8, 2025 | MathMultimodal Reasoning | CodeCode Available | 1 |
| Right Question is Already Half the Answer: Fully Unsupervised LLM Reasoning Incentivization | Apr 8, 2025 | MathMathematical Reasoning | CodeCode Available | 2 |
| Beyond Single-Turn: A Survey on Multi-Turn Interactions with Large Language Models | Apr 7, 2025 | Dialogue EvaluationFairness | CodeCode Available | 2 |
| Synthetic Data Generation & Multi-Step RL for Reasoning & Tool Use | Apr 7, 2025 | GSM8KMath | —Unverified | 0 |
| Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models | Apr 7, 2025 | MathQuantization | CodeCode Available | 2 |
| Efficient Reinforcement Finetuning via Adaptive Curriculum Learning | Apr 7, 2025 | MathMathematical Reasoning | CodeCode Available | 2 |
| Reasoning Models Know When They're Right: Probing Hidden States for Self-Verification | Apr 7, 2025 | Logical ReasoningMath | —Unverified | 0 |
| Retro-Search: Exploring Untaken Paths for Deeper and Efficient Reasoning | Apr 6, 2025 | Math | —Unverified | 0 |
| oneDAL Optimization for ARM Scalable Vector Extension: Maximizing Efficiency for High-Performance Data Science | Apr 5, 2025 | Math | —Unverified | 0 |
| Explain with Visual Keypoints Like a Real Mentor! A Benchmark for Multimodal Solution Explanation | Apr 4, 2025 | MathMathematical Reasoning | —Unverified | 0 |
| Online Difficulty Filtering for Reasoning Oriented Reinforcement Learning | Apr 4, 2025 | Mathreinforcement-learning | —Unverified | 0 |
| Large (Vision) Language Models are Unsupervised In-Context Learners | Apr 3, 2025 | GSM8KIn-Context Learning | CodeCode Available | 1 |
| MegaMath: Pushing the Limits of Open Math Corpora | Apr 3, 2025 | DiversityMath | CodeCode Available | 2 |
| BlenderGym: Benchmarking Foundational Model Systems for Graphics Editing | Apr 2, 2025 | 3D ReconstructionBenchmarking | CodeCode Available | 1 |
| Cross-Lingual Consistency: A Novel Inference Framework for Advancing Reasoning in Large Language Models | Apr 2, 2025 | Math | —Unverified | 0 |
| How Difficulty-Aware Staged Reinforcement Learning Enhances LLMs' Reasoning Capabilities: A Preliminary Experimental Study | Apr 1, 2025 | Code GenerationMath | —Unverified | 0 |
| GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning | Apr 1, 2025 | MathMathematical Reasoning | —Unverified | 0 |
| Hawkeye:Efficient Reasoning with Model Collaboration | Apr 1, 2025 | Mathmodel | —Unverified | 0 |
| Brains vs. Bytes: Evaluating LLM Proficiency in Olympiad Mathematics | Apr 1, 2025 | MathMathematical Problem-Solving | —Unverified | 0 |
| Investigating Large Language Models in Diagnosing Students' Cognitive Skills in Math Problem-solving | Apr 1, 2025 | Math | —Unverified | 0 |
| Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies Ahead | Mar 31, 2025 | MathSpatial Reasoning | CodeCode Available | 2 |
| Entropy-Based Adaptive Weighting for Self-Training | Mar 31, 2025 | GSM8KMath | CodeCode Available | 1 |
| An extrapolated and provably convergent algorithm for nonlinear matrix decomposition with the ReLU function | Mar 31, 2025 | Data CompressionMath | CodeCode Available | 0 |
| DebFlow: Automating Agent Creation via Agent Debate | Mar 31, 2025 | Math | —Unverified | 0 |
| ToRL: Scaling Tool-Integrated RL | Mar 30, 2025 | Mathreinforcement-learning | CodeCode Available | 3 |
| Learning to Reason for Long-Form Story Generation | Mar 28, 2025 | FormMath | CodeCode Available | 2 |
| QuestBench: Can LLMs ask the right question to acquire information in reasoning tasks? | Mar 28, 2025 | Logical ReasoningMath | CodeCode Available | 1 |
| CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models | Mar 28, 2025 | GPUGSM8K | CodeCode Available | 2 |
| ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models | Mar 27, 2025 | Math | CodeCode Available | 1 |
| Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad | Mar 27, 2025 | MathMathematical Reasoning | —Unverified | 0 |
| Challenging the Boundaries of Reasoning: An Olympiad-Level Math Benchmark for Large Language Models | Mar 27, 2025 | Data VisualizationMath | CodeCode Available | 0 |
| Effective Skill Unlearning through Intervention and Abstention | Mar 27, 2025 | General KnowledgeMath | CodeCode Available | 0 |
| Scaling Evaluation-time Compute with Reasoning Models as Process Evaluators | Mar 25, 2025 | Math | —Unverified | 0 |