| Nerva: a Truly Sparse Implementation of Neural Networks | Jul 24, 2024 | Math | CodeCode Available | 1 |
| MathViz-E: A Case-study in Domain-Specialized Tool-Using Agents | Jul 24, 2024 | Math | CodeCode Available | 1 |
| Toward Adaptive Reasoning in Large Language Models with Thought Rollback | Jul 21, 2024 | Arithmetic ReasoningMath | CodeCode Available | 1 |
| Learning Goal-Conditioned Representations for Language Reward Models | Jul 18, 2024 | GSM8KMath | CodeCode Available | 1 |
| TurkishMMLU: Measuring Massive Multitask Language Understanding in Turkish | Jul 17, 2024 | MathMultiple-choice | CodeCode Available | 1 |
| OptiBench Meets ReSocratic: Measure and Improve LLMs for Optimization Modeling | Jul 13, 2024 | BenchmarkingMath | CodeCode Available | 1 |
| AutoBencher: Creating Salient, Novel, Difficult Datasets for Language Models | Jul 11, 2024 | Language ModellingMath | CodeCode Available | 1 |
| DotaMath: Decomposition of Thought with Code Assistance and Self-correction for Mathematical Reasoning | Jul 4, 2024 | AvgGSM8K | CodeCode Available | 1 |
| Eliminating Position Bias of Language Models: A Mechanistic Approach | Jul 1, 2024 | Mathobject-detection | CodeCode Available | 1 |
| Step-Controlled DPO: Leveraging Stepwise Error for Enhanced Mathematical Reasoning | Jun 30, 2024 | GSM8KMath | CodeCode Available | 1 |