| The Perfect Blend: Redefining RLHF with Mixture of Judges | Sep 30, 2024 | Instruction FollowingMath | —Unverified | 0 |
| Giving BERT a Calculator: Finding Operations and Arguments with Reading Comprehension | Aug 31, 2019 | MathQuestion Answering | —Unverified | 0 |
| GLoRe: When, Where, and How to Improve LLM Reasoning via Global and Local Refinements | Feb 13, 2024 | GSM8KMath | —Unverified | 0 |
| Gold-medalist Performance in Solving Olympiad Geometry with AlphaGeometry2 | Feb 5, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Fewer is More: Boosting LLM Reasoning with Reinforced Context Pruning | Dec 14, 2023 | Arithmetic ReasoningFew-Shot Learning | —Unverified | 0 |
| BOLT: Bootstrap Long Chain-of-Thought in Language Models without Distillation | Feb 6, 2025 | In-Context LearningKnowledge Distillation | —Unverified | 0 |
| GPT Carry-On: Training Foundation Model for Customization Could Be Simple, Scalable and Affordable | Apr 10, 2025 | GPUMath | —Unverified | 0 |
| GPT takes the SAT: Tracing changes in Test Difficulty and Math Performance of Students | Sep 16, 2024 | Math | —Unverified | 0 |
| GPU Domain Specialization via Composable On-Package Architecture | Apr 5, 2021 | GPUMath | —Unverified | 0 |
| Graders should cheat: privileged information enables expert-level automated evaluations | Feb 16, 2025 | Math | —Unverified | 0 |