| Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language Models | Mar 4, 2024 | Data AugmentationGSM8K | CodeCode Available | 1 |
| Key-Point-Driven Data Synthesis with its Enhancement on Mathematical Reasoning | Mar 4, 2024 | GSM8KMath | —Unverified | 0 |
| Improving the Validity of Automatically Generated Feedback via Reinforcement Learning | Mar 2, 2024 | MathMisconceptions | CodeCode Available | 1 |
| ClickTree: A Tree-based Method for Predicting Math Students' Performance Based on Clickstream Data | Mar 1, 2024 | Math | —Unverified | 0 |
| Functional Benchmarks for Robust Evaluation of Reasoning Performance, and the Reasoning Gap | Feb 29, 2024 | Math | CodeCode Available | 2 |
| PRSA: Prompt Stealing Attacks against Real-World Prompt Services | Feb 29, 2024 | Math | —Unverified | 0 |
| GSM-Plus: A Comprehensive Benchmark for Evaluating the Robustness of LLMs as Mathematical Problem Solvers | Feb 29, 2024 | GSM8KMath | CodeCode Available | 2 |
| StarCoder 2 and The Stack v2: The Next Generation | Feb 29, 2024 | Code CompletionCode Generation | CodeCode Available | 7 |
| Data Interpreter: An LLM Agent For Data Science | Feb 28, 2024 | Code GenerationLanguage Modelling | —Unverified | 0 |
| Adversarial Math Word Problem Generation | Feb 27, 2024 | Math | CodeCode Available | 0 |