| LoRA-Pro: Are Low-Rank Adapters Properly Optimized? | Jul 25, 2024 | Code GenerationComputational Efficiency | CodeCode Available | 2 |
| Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning | Jul 25, 2024 | Knowledge DistillationMathematical Reasoning | CodeCode Available | 2 |
| LEAN-GitHub: Compiling GitHub LEAN repositories for a versatile LEAN prover | Jul 24, 2024 | Automated Theorem ProvingMath | CodeCode Available | 4 |
| Toward Adaptive Reasoning in Large Language Models with Thought Rollback | Jul 21, 2024 | Arithmetic ReasoningMath | CodeCode Available | 1 |
| A Comprehensive Evaluation of Large Language Models on Temporal Event Forecasting | Jul 16, 2024 | Mathematical ReasoningQuestion Answering | —Unverified | 0 |
| NeedleBench: Can LLMs Do Retrieval and Reasoning in Information-Dense Context? | Jul 16, 2024 | 4k8k | CodeCode Available | 9 |
| Reliable Reasoning Beyond Natural Language | Jul 16, 2024 | GSM8KMathematical Reasoning | —Unverified | 0 |
| Fine-Tuning and Prompt Optimization: Two Great Steps that Work Better Together | Jul 15, 2024 | Arithmetic ReasoningLanguage Modeling | —Unverified | 0 |
| Key-Point-Driven Mathematical Reasoning Distillation of Large Language Model | Jul 14, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| OptiBench Meets ReSocratic: Measure and Improve LLMs for Optimization Modeling | Jul 13, 2024 | BenchmarkingMath | CodeCode Available | 1 |