| LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters | May 27, 2024 | BenchmarkingGSM8K | CodeCode Available | 2 | 5 |
| Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification | Aug 15, 2023 | Arithmetic ReasoningMath | CodeCode Available | 2 | 5 |
| Learning to Reason for Long-Form Story Generation | Mar 28, 2025 | FormMath | CodeCode Available | 2 | 5 |
| Delta-CoMe: Training-Free Delta-Compression with Mixed-Precision for Large Language Models | Jun 13, 2024 | MathQuantization | CodeCode Available | 2 | 5 |
| Essential-Web v1.0: 24T tokens of organized web data | Jun 17, 2025 | Math | CodeCode Available | 2 | 5 |
| Archon: An Architecture Search Framework for Inference-Time Techniques | Sep 23, 2024 | Hyperparameter OptimizationInstruction Following | CodeCode Available | 2 | 5 |
| AbstentionBench: Reasoning LLMs Fail on Unanswerable Questions | Jun 10, 2025 | Math | CodeCode Available | 2 | 5 |
| Steering Large Language Models between Code Execution and Textual Reasoning | Oct 4, 2024 | Code GenerationMath | CodeCode Available | 2 | 5 |
| Balancing LoRA Performance and Efficiency with Simple Shard Sharing | Sep 19, 2024 | Computational EfficiencyGSM8K | CodeCode Available | 2 | 5 |
| LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training | Nov 24, 2024 | MathMixture-of-Experts | CodeCode Available | 2 | 5 |