| Multi-lingual Functional Evaluation for Large Language Models | Jun 25, 2025 | BelebeleInstruction Following | —Unverified | 0 |
| AALC: Large Language Model Efficient Reasoning via Adaptive Accuracy-Length Control | Jun 25, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| When Life Gives You Samples: The Benefits of Scaling up Inference Compute for Multilingual LLMs | Jun 25, 2025 | Math | —Unverified | 0 |
| OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling | Jun 25, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Causal Decomposition Analysis with Synergistic Interventions: A Triply-Robust Machine Learning Approach to Addressing Multiple Dimensions of Social Disparities | Jun 23, 2025 | Math | —Unverified | 0 |
| Plan for Speed -- Dilated Scheduling for Masked Diffusion Language Models | Jun 23, 2025 | Code CompletionGSM8K | —Unverified | 0 |
| ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs | Jun 23, 2025 | Math | CodeCode Available | 0 |
| Confucius3-Math: A Lightweight High-Performance Reasoning LLM for Chinese K-12 Mathematics Learning | Jun 23, 2025 | GPULarge Language Model | CodeCode Available | 2 |
| Evolving Prompts In-Context: An Open-ended, Self-replicating Perspective | Jun 22, 2025 | In-Context LearningLarge Language Model | CodeCode Available | 1 |
| Shrinking the Generation-Verification Gap with Weak Verifiers | Jun 22, 2025 | Math | —Unverified | 0 |