| LoRA-Pro: Are Low-Rank Adapters Properly Optimized? | Jul 25, 2024 | Code GenerationComputational Efficiency | CodeCode Available | 2 |
| Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning | Jul 25, 2024 | Knowledge DistillationMathematical Reasoning | CodeCode Available | 2 |
| LEAN-GitHub: Compiling GitHub LEAN repositories for a versatile LEAN prover | Jul 24, 2024 | Automated Theorem ProvingMath | CodeCode Available | 4 |
| Toward Adaptive Reasoning in Large Language Models with Thought Rollback | Jul 21, 2024 | Arithmetic ReasoningMath | CodeCode Available | 1 |
| A Comprehensive Evaluation of Large Language Models on Temporal Event Forecasting | Jul 16, 2024 | Mathematical ReasoningQuestion Answering | —Unverified | 0 |
| NeedleBench: Can LLMs Do Retrieval and Reasoning in Information-Dense Context? | Jul 16, 2024 | 4k8k | CodeCode Available | 9 |
| Reliable Reasoning Beyond Natural Language | Jul 16, 2024 | GSM8KMathematical Reasoning | —Unverified | 0 |
| Fine-Tuning and Prompt Optimization: Two Great Steps that Work Better Together | Jul 15, 2024 | Arithmetic ReasoningLanguage Modeling | —Unverified | 0 |
| Key-Point-Driven Mathematical Reasoning Distillation of Large Language Model | Jul 14, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| OptiBench Meets ReSocratic: Measure and Improve LLMs for Optimization Modeling | Jul 13, 2024 | BenchmarkingMath | CodeCode Available | 1 |
| Token-Supervised Value Models for Enhancing Mathematical Reasoning Capabilities of Large Language Models | Jul 12, 2024 | GSM8KMath | —Unverified | 0 |
| Skywork-Math: Data Scaling Laws for Mathematical Reasoning in Large Language Models -- The Story Goes On | Jul 11, 2024 | GSM8KMath | —Unverified | 0 |
| MAVIS: Mathematical Visual Instruction Tuning with an Automatic Data Engine | Jul 11, 2024 | Contrastive LearningLanguage Modelling | CodeCode Available | 4 |
| Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist | Jul 11, 2024 | GSM8KMath | —Unverified | 0 |
| SOLO: A Single Transformer for Scalable Vision-Language Modeling | Jul 8, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Progress or Regress? Self-Improvement Reversal in Post-training | Jul 6, 2024 | DiversityMathematical Reasoning | —Unverified | 0 |
| LogicVista: Multimodal LLM Logical Reasoning Benchmark in Visual Contexts | Jul 6, 2024 | Logical ReasoningMathematical Reasoning | CodeCode Available | 1 |
| Smart Vision-Language Reasoners | Jul 5, 2024 | MathMathematical Reasoning | CodeCode Available | 0 |
| DotaMath: Decomposition of Thought with Code Assistance and Self-correction for Mathematical Reasoning | Jul 4, 2024 | AvgGSM8K | CodeCode Available | 1 |
| How Does Quantization Affect Multilingual LLMs? | Jul 3, 2024 | Mathematical ReasoningQuantization | —Unverified | 0 |
| TheoremLlama: Transforming General-Purpose LLMs into Lean4 Experts | Jul 3, 2024 | Automated Theorem ProvingCode Generation | CodeCode Available | 1 |
| Integrate the Essence and Eliminate the Dross: Fine-Grained Self-Consistency for Free-Form Language Generation | Jul 2, 2024 | Code GenerationForm | CodeCode Available | 0 |
| FRoG: Evaluating Fuzzy Reasoning of Generalized Quantifiers in Large Language Models | Jul 1, 2024 | Mathematical Reasoning | CodeCode Available | 0 |
| We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning? | Jul 1, 2024 | MathMathematical Reasoning | CodeCode Available | 2 |
| Step-Controlled DPO: Leveraging Stepwise Error for Enhanced Mathematical Reasoning | Jun 30, 2024 | GSM8KMath | CodeCode Available | 1 |