| Think Twice: Enhancing LLM Reasoning by Scaling Multi-round Test-time Thinking | Mar 25, 2025 | MathReinforcement Learning (RL) | —Unverified | 0 |
| 1.4 Million Open-Source Distilled Reasoning Dataset to Empower Large Language Model Training | Mar 25, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Gemma 3 Technical Report | Mar 25, 2025 | Instruction FollowingMath | —Unverified | 0 |
| LogQuant: Log-Distributed 2-Bit Quantization of KV Cache with Superior Accuracy Preservation | Mar 25, 2025 | Code CompletionLanguage Modeling | CodeCode Available | 1 |
| Overcoming Vocabulary Mismatch: Vocabulary-agnostic Teacher Guided Language Modeling | Mar 24, 2025 | Continual PretrainingLanguage Modeling | —Unverified | 0 |
| Activation Functions Considered Harmful: Recovering Neural Network Weights through Controlled Channels | Mar 24, 2025 | Math | —Unverified | 0 |
| SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild | Mar 24, 2025 | Instruction FollowingMath | CodeCode Available | 7 |
| Teaching LLMs for Step-Level Automatic Math Correction via Reinforcement Learning | Mar 24, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Reasoning to Learn from Latent Thoughts | Mar 24, 2025 | MathText Generation | CodeCode Available | 2 |
| AgentRxiv: Towards Collaborative Autonomous Research | Mar 23, 2025 | Mathscientific discovery | CodeCode Available | 9 |
| Long Is More Important Than Difficult for Training Reasoning Models | Mar 23, 2025 | Math | —Unverified | 0 |
| Lost in Cultural Translation: Do LLMs Struggle with Math Across Cultural Contexts? | Mar 23, 2025 | GSM8KMath | CodeCode Available | 0 |
| MathAgent: Leveraging a Mixture-of-Math-Agent Framework for Real-World Multimodal Mathematical Error Detection | Mar 23, 2025 | MathMathematical Problem-Solving | —Unverified | 0 |
| ChatBench: From Static Benchmarks to Human-AI Evaluation | Mar 22, 2025 | MathMMLU | CodeCode Available | 0 |
| FastCuRL: Curriculum Reinforcement Learning with Progressive Context Extension for Efficient Training R1-like Reasoning Models | Mar 21, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Exploring the Hidden Reasoning Process of Large Language Models by Misleading Them | Mar 20, 2025 | MathMemorization | —Unverified | 0 |
| Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs | Mar 18, 2025 | GSM8KMath | —Unverified | 0 |
| BurTorch: Revisiting Training from First Principles by Coupling Autodiff, Math Optimization, and Systems | Mar 18, 2025 | CPUMath | —Unverified | 0 |
| Improving Complex Reasoning with Dynamic Prompt Corruption: A soft prompt Optimization Approach | Mar 17, 2025 | GSM8KMath | —Unverified | 0 |
| Pensez: Less Data, Better Reasoning -- Rethinking French LLM | Mar 17, 2025 | Large Language ModelMath | —Unverified | 0 |
| xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference | Mar 17, 2025 | MambaMath | CodeCode Available | 7 |
| EXAONE Deep: Reasoning Enhanced Language Models | Mar 16, 2025 | Math | CodeCode Available | 1 |
| SPIN-Bench: How Well Do LLMs Plan Strategically and Reason Socially? | Mar 16, 2025 | Board GamesCard Games | —Unverified | 0 |
| Chat-TS: Enhancing Multi-Modal Reasoning Over Time-Series and Natural Language Data | Mar 13, 2025 | Large Language ModelMath | —Unverified | 0 |
| Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond | Mar 13, 2025 | Domain GeneralizationMath | CodeCode Available | 4 |
| VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search | Mar 13, 2025 | Image RetrievalMath | CodeCode Available | 1 |
| Conformal Prediction Sets for Deep Generative Models via Reduction to Conformal Regression | Mar 13, 2025 | Code GenerationConformal Prediction | —Unverified | 0 |
| StepMathAgent: A Step-Wise Agent for Evaluating Mathematical Processes through Tree-of-Error | Mar 13, 2025 | Math | CodeCode Available | 0 |
| Understanding the Logical Capabilities of Large Language Models via Out-of-Context Representation Learning | Mar 13, 2025 | In-Context LearningMath | —Unverified | 0 |
| The Impact of Item-Writing Flaws on Difficulty and Discrimination in Item Response Theory | Mar 13, 2025 | MathMultiple-choice | —Unverified | 0 |
| EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees | Mar 11, 2025 | ChatbotLanguage Modeling | CodeCode Available | 1 |
| From Text to Visuals: Using LLMs to Generate Math Diagrams with Vector Graphics | Mar 10, 2025 | MathQuestion Answering | —Unverified | 0 |
| Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning | Mar 10, 2025 | MathMeta Reinforcement Learning | —Unverified | 0 |
| Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models | Mar 9, 2025 | MathMultimodal Reasoning | CodeCode Available | 5 |
| Decoding the Black Box: Integrating Moral Imagination with Technical AI Governance | Mar 9, 2025 | EthicsMath | —Unverified | 0 |
| InftyThink: Breaking the Length Limits of Long-Context Reasoning in Large Language Models | Mar 9, 2025 | Computational EfficiencyMath | —Unverified | 0 |
| Symbolic Mixture-of-Experts: Adaptive Skill-based Routing for Heterogeneous Reasoning | Mar 7, 2025 | GPUMath | —Unverified | 0 |
| Compositional Causal Reasoning Evaluation in Language Models | Mar 6, 2025 | Math | —Unverified | 0 |
| HelpSteer3: Human-Annotated Feedback and Edit Data to Empower Inference-Time Scaling in Open-Ended General-Domain Tasks | Mar 6, 2025 | ChatbotLogical Reasoning | —Unverified | 0 |
| Benchmarking Reasoning Robustness in Large Language Models | Mar 6, 2025 | BenchmarkingMath | —Unverified | 0 |
| Better Process Supervision with Bi-directional Rewarding Signals | Mar 6, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| SOLAR: Scalable Optimization of Large-scale Architecture for Reasoning | Mar 6, 2025 | GSM8KMath | —Unverified | 0 |
| START: Self-taught Reasoner with Tools | Mar 6, 2025 | MathSelf-Learning | —Unverified | 0 |
| Performance Comparison of Large Language Models on Advanced Calculus Problems | Mar 5, 2025 | MathMathematical Problem-Solving | —Unverified | 0 |
| LEWIS (LayEr WIse Sparsity) -- A Training Free Guided Model Merging Approach | Mar 5, 2025 | Instruction FollowingMath | —Unverified | 0 |
| FANS -- Formal Answer Selection for Natural Language Math Reasoning Using Lean4 | Mar 5, 2025 | Answer SelectionMath | —Unverified | 0 |
| Self-Evolved Preference Optimization for Enhancing Mathematical Reasoning in Small Language Models | Mar 4, 2025 | GSM8KMath | —Unverified | 0 |
| PromptCoT: Synthesizing Olympiad-level Problems for Mathematical Reasoning in Large Language Models | Mar 4, 2025 | GSM8KMath | CodeCode Available | 1 |
| What's Behind PPO's Collapse in Long-CoT? Value Optimization Holds the Secret | Mar 3, 2025 | MathReinforcement Learning (RL) | —Unverified | 0 |
| When an LLM is apprehensive about its answers -- and when its uncertainty is justified | Mar 3, 2025 | MathMMLU | CodeCode Available | 0 |