| MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Interactions | May 29, 2024 | BenchmarkingDialogue Understanding | CodeCode Available | 1 |
| ReflectionCoder: Learning from Reflection Sequence for Enhanced One-off Code Generation | May 27, 2024 | Code GenerationHumanEval | CodeCode Available | 1 |
| STRIDE: A Tool-Assisted LLM Agent Framework for Strategic and Interactive Decision-Making | May 25, 2024 | Decision MakingMathematical Reasoning | CodeCode Available | 1 |
| Basis Selection: Low-Rank Decomposition of Pretrained Large Language Models for Target Applications | May 24, 2024 | Code GenerationLow-rank compression | —Unverified | 0 |
| VB-LoRA: Extreme Parameter Efficient Fine-Tuning with Vector Banks | May 24, 2024 | Mathematical ReasoningNatural Language Understanding | CodeCode Available | 1 |
| Intelligent Go-Explore: Standing on the Shoulders of Giant Foundation Models | May 24, 2024 | Atari GamesMathematical Reasoning | CodeCode Available | 2 |
| Can LLMs Solve longer Math Word Problems Better? | May 23, 2024 | Data AugmentationMath | CodeCode Available | 0 |
| DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data | May 23, 2024 | Automated Theorem ProvingMathematical Reasoning | —Unverified | 0 |
| JiuZhang3.0: Efficiently Improving Mathematical Reasoning by Training Small Data Synthesis Models | May 23, 2024 | Knowledge DistillationMath | CodeCode Available | 1 |
| Embedding Trajectory for Out-of-Distribution Detection in Mathematical Reasoning | May 22, 2024 | Mathematical ReasoningMultiple-choice | CodeCode Available | 1 |