| Investigating the Potential of Large Language Model-Based Router Multi-Agent Architectures for Foundation Design Automation: A Task Classification and Expert Selection Study | Jun 13, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Spurious Rewards: Rethinking Training Signals in RLVR | Jun 12, 2025 | MathMathematical Reasoning | CodeCode Available | 3 |
| PREMISE: Scalable and Strategic Prompt Optimization for Efficient Mathematical Reasoning in Large Models | Jun 12, 2025 | GSM8KMathematical Reasoning | —Unverified | 0 |
| NeuralNexus at BEA 2025 Shared Task: Retrieval-Augmented Prompting for Mistake Identification in AI Tutors | Jun 12, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| Slimming Down LLMs Without Losing Their Minds | Jun 12, 2025 | Computational EfficiencyGSM8K | —Unverified | 0 |
| Discovering Hierarchical Latent Capabilities of Language Models via Causal Representation Learning | Jun 12, 2025 | Instruction FollowingMathematical Reasoning | CodeCode Available | 0 |
| TeleMath: A Benchmark for Large Language Models in Telecom Mathematical Problem Solving | Jun 12, 2025 | Logical ReasoningMathematical Problem-Solving | —Unverified | 0 |
| Beyond Gold Standards: Epistemic Ensemble of LLM Judges for Formal Mathematical Reasoning | Jun 12, 2025 | Mathematical Reasoning | —Unverified | 0 |
| RePO: Replay-Enhanced Policy Optimization | Jun 11, 2025 | MathMathematical Reasoning | CodeCode Available | 1 |
| Vision Matters: Simple Visual Perturbations Can Boost Multimodal Math Reasoning | Jun 11, 2025 | Image CaptioningMath | CodeCode Available | 2 |
| Large Language Models for Design Structure Matrix Optimization | Jun 11, 2025 | Combinatorial OptimizationMathematical Reasoning | —Unverified | 0 |
| Towards Efficient and Effective Alignment of Large Language Models | Jun 11, 2025 | Mathematical ReasoningMeta-Learning | —Unverified | 0 |
| CoRT: Code-integrated Reasoning within Thinking | Jun 11, 2025 | Mathematical Reasoning | CodeCode Available | 2 |
| Omni-DPO: A Dual-Perspective Paradigm for Dynamic Preference Learning of LLMs | Jun 11, 2025 | Mathematical Reasoning | CodeCode Available | 0 |
| Can A Gamer Train A Mathematical Reasoning Model? | Jun 10, 2025 | GPUMathematical Reasoning | CodeCode Available | 0 |
| Large Language Models Have Intrinsic Meta-Cognition, but Need a Good Lens | Jun 10, 2025 | BenchmarkingMathematical Reasoning | —Unverified | 0 |
| VReST: Enhancing Reasoning in Large Vision-Language Models through Tree Search and Self-Reward Mechanism | Jun 10, 2025 | Mathematical ReasoningVisual Reasoning | CodeCode Available | 0 |
| A Survey on Large Language Models for Mathematical Reasoning | Jun 10, 2025 | Answer GenerationMathematical Reasoning | —Unverified | 0 |
| WeThink: Toward General-purpose Vision-Language Reasoning via Reinforcement Learning | Jun 9, 2025 | MathMathematical Reasoning | CodeCode Available | 1 |
| Temporalizing Confidence: Evaluation of Chain-of-Thought Reasoning with Signal Temporal Logic | Jun 9, 2025 | Mathematical Reasoning | —Unverified | 0 |
| Can Theoretical Physics Research Benefit from Language Agents? | Jun 6, 2025 | Code GenerationMathematical Reasoning | —Unverified | 0 |
| Beyond Accuracy: Dissecting Mathematical Reasoning for LLMs Under Reinforcement Learning | Jun 5, 2025 | Mathematical ReasoningProblem Decomposition | —Unverified | 0 |
| Multi-Layer GRPO: Enhancing Reasoning and Self-Correction in Large Language Models | Jun 5, 2025 | Mathematical Reasoning | —Unverified | 0 |
| ProRefine: Inference-time Prompt Refinement with Textual Feedback | Jun 5, 2025 | Mathematical Reasoning | —Unverified | 0 |
| LogicPuzzleRL: Cultivating Robust Mathematical Reasoning in LLMs via Reinforcement Learning | Jun 5, 2025 | Mathematical Reasoningreinforcement-learning | CodeCode Available | 0 |