| EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees | Mar 11, 2025 | ChatbotLanguage Modeling | CodeCode Available | 1 |
| From Text to Visuals: Using LLMs to Generate Math Diagrams with Vector Graphics | Mar 10, 2025 | MathQuestion Answering | —Unverified | 0 |
| Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning | Mar 10, 2025 | MathMeta Reinforcement Learning | —Unverified | 0 |
| Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models | Mar 9, 2025 | MathMultimodal Reasoning | CodeCode Available | 5 |
| InftyThink: Breaking the Length Limits of Long-Context Reasoning in Large Language Models | Mar 9, 2025 | Computational EfficiencyMath | —Unverified | 0 |
| Decoding the Black Box: Integrating Moral Imagination with Technical AI Governance | Mar 9, 2025 | EthicsMath | —Unverified | 0 |
| Symbolic Mixture-of-Experts: Adaptive Skill-based Routing for Heterogeneous Reasoning | Mar 7, 2025 | GPUMath | —Unverified | 0 |
| Compositional Causal Reasoning Evaluation in Language Models | Mar 6, 2025 | Math | —Unverified | 0 |
| HelpSteer3: Human-Annotated Feedback and Edit Data to Empower Inference-Time Scaling in Open-Ended General-Domain Tasks | Mar 6, 2025 | ChatbotLogical Reasoning | —Unverified | 0 |
| Benchmarking Reasoning Robustness in Large Language Models | Mar 6, 2025 | BenchmarkingMath | —Unverified | 0 |