| L2CEval: Evaluating Language-to-Code Generation Capabilities of Large Language Models | Sep 29, 2023 | Code GenerationMath | —Unverified | 0 | 0 |
| Better Process Supervision with Bi-directional Rewarding Signals | Mar 6, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Adapting the LodView RDF Browser for Navigation over the Multilingual Linguistic Linked Open Data Cloud | Aug 28, 2022 | Math | —Unverified | 0 | 0 |
| Benchmarking Reasoning Robustness in Large Language Models | Mar 6, 2025 | BenchmarkingMath | —Unverified | 0 | 0 |
| THOUGHTTERMINATOR: Benchmarking, Calibrating, and Mitigating Overthinking in Reasoning Models | Apr 17, 2025 | BenchmarkingMath | —Unverified | 0 | 0 |
| Tighter 'uniform bounds for Black-Scholes implied volatility' and the applications to root-finding | Feb 17, 2023 | Math | —Unverified | 0 | 0 |
| Language Models with Conformal Factuality Guarantees | Feb 15, 2024 | Conformal PredictionLanguage Modeling | —Unverified | 0 | 0 |
| TinyGSM: achieving >80% on GSM8k with small language models | Dec 14, 2023 | Arithmetic ReasoningGSM8K | —Unverified | 0 | 0 |
| YODA: Teacher-Student Progressive Learning for Language Models | Jan 28, 2024 | GSM8KMath | —Unverified | 0 | 0 |
| Large Language Models Are Struggle to Cope with Unreasonability in Math Problems | Mar 28, 2024 | Math | —Unverified | 0 | 0 |