| Towards better Human-Agent Alignment: Assessing Task Utility in LLM-Powered Applications | Feb 14, 2024 | Math | —Unverified | 0 |
| MUSTARD: Mastering Uniform Synthesis of Theorem and Proof Data | Feb 14, 2024 | Automated Theorem ProvingLanguage Modelling | CodeCode Available | 1 |
| GLoRe: When, Where, and How to Improve LLM Reasoning via Global and Local Refinements | Feb 13, 2024 | GSM8KMath | —Unverified | 0 |
| EvoGPT-f: An Evolutionary GPT Framework for Benchmarking Formal Math Languages | Feb 12, 2024 | Automated Theorem ProvingBenchmarking | —Unverified | 0 |
| Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models | Feb 12, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| Autonomous Data Selection with Zero-shot Generative Classifiers for Mathematical Texts | Feb 12, 2024 | Continual PretrainingGSM8K | CodeCode Available | 2 |
| Understanding the Progression of Educational Topics via Semantic Matching | Feb 10, 2024 | Math | —Unverified | 0 |
| InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning | Feb 9, 2024 | Data AugmentationGSM8K | CodeCode Available | 4 |
| V-STaR: Training Verifiers for Self-Taught Reasoners | Feb 9, 2024 | Code GenerationMath | —Unverified | 0 |
| Noise Contrastive Alignment of Language Models with Explicit Rewards | Feb 8, 2024 | Language ModellingMath | CodeCode Available | 3 |