| AALC: Large Language Model Efficient Reasoning via Adaptive Accuracy-Length Control | Jun 25, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| When Life Gives You Samples: The Benefits of Scaling up Inference Compute for Multilingual LLMs | Jun 25, 2025 | Math | —Unverified | 0 |
| Multi-lingual Functional Evaluation for Large Language Models | Jun 25, 2025 | BelebeleInstruction Following | —Unverified | 0 |
| ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs | Jun 23, 2025 | Math | —Unverified | 0 |
| Causal Decomposition Analysis with Synergistic Interventions: A Triply-Robust Machine Learning Approach to Addressing Multiple Dimensions of Social Disparities | Jun 23, 2025 | Math | —Unverified | 0 |
| Plan for Speed -- Dilated Scheduling for Masked Diffusion Language Models | Jun 23, 2025 | Code CompletionGSM8K | —Unverified | 0 |
| Shrinking the Generation-Verification Gap with Weak Verifiers | Jun 22, 2025 | Math | —Unverified | 0 |
| Leveraging LLMs to Assess Tutor Moves in Real-Life Dialogues: A Feasibility Study | Jun 20, 2025 | Math | —Unverified | 0 |
| No Free Lunch: Rethinking Internal Feedback for LLM Reasoning | Jun 20, 2025 | Mathreinforcement-learning | —Unverified | 0 |
| AgentGroupChat-V2: Divide-and-Conquer Is What LLM-Based Multi-Agent System Need | Jun 18, 2025 | GSM8KHumanEval | CodeCode Available | 0 |
| SIRI-Bench: Challenging VLMs' Spatial Intelligence through Complex Reasoning Tasks | Jun 17, 2025 | MathSpatial Reasoning | —Unverified | 0 |
| Utility-Driven Speculative Decoding for Mixture-of-Experts | Jun 17, 2025 | GPULarge Language Model | —Unverified | 0 |
| Weakest Link in the Chain: Security Vulnerabilities in Advanced Reasoning Models | Jun 16, 2025 | Math | —Unverified | 0 |
| AceReason-Nemotron 1.1: Advancing Math and Code Reasoning through SFT and RL Synergy | Jun 16, 2025 | MathReinforcement Learning (RL) | —Unverified | 0 |
| Direct Reasoning Optimization: LLMs Can Reward And Refine Their Own Reasoning for Open-Ended Tasks | Jun 16, 2025 | FormMath | —Unverified | 0 |
| Adaptive Guidance Accelerates Reinforcement Learning of Reasoning Models | Jun 16, 2025 | Mathreinforcement-learning | —Unverified | 0 |
| VGR: Visual Grounded Reasoning | Jun 13, 2025 | Large Language ModelMath | —Unverified | 0 |
| Agent-RLVR: Training Software Engineering Agents via Guidance and Environment Rewards | Jun 13, 2025 | MathNavigate | —Unverified | 0 |
| ReCUT: Balancing Reasoning Length and Accuracy in LLMs via Stepwise Trails and Preference Optimization | Jun 12, 2025 | Math | CodeCode Available | 0 |
| Learning a Continue-Thinking Token for Enhanced Test-Time Scaling | Jun 12, 2025 | GSM8KMath | CodeCode Available | 0 |
| TTT-Bench: A Benchmark for Evaluating Reasoning Ability with Simple and Novel Tic-Tac-Toe-style Games | Jun 11, 2025 | Logical ReasoningMath | —Unverified | 0 |
| Reinforce LLM Reasoning through Multi-Agent Reflection | Jun 10, 2025 | MathOut-of-Distribution Generalization | —Unverified | 0 |
| Enhancing Reasoning Capabilities of Small Language Models with Blueprints and Prompt Template Search | Jun 10, 2025 | GSM8KMath | —Unverified | 0 |
| LeanTutor: A Formally-Verified AI Tutor for Mathematical Proofs | Jun 10, 2025 | Large Language ModelMath | —Unverified | 0 |
| Learning to Reason Across Parallel Samples for LLM Reasoning | Jun 10, 2025 | MathRe-Ranking | —Unverified | 0 |