| TheoremLlama: Transforming General-Purpose LLMs into Lean4 Experts | Jul 3, 2024 | Automated Theorem ProvingCode Generation | CodeCode Available | 1 |
| Integrate the Essence and Eliminate the Dross: Fine-Grained Self-Consistency for Free-Form Language Generation | Jul 2, 2024 | Code GenerationForm | CodeCode Available | 0 |
| FRoG: Evaluating Fuzzy Reasoning of Generalized Quantifiers in Large Language Models | Jul 1, 2024 | Mathematical Reasoning | CodeCode Available | 0 |
| We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning? | Jul 1, 2024 | MathMathematical Reasoning | CodeCode Available | 2 |
| Step-Controlled DPO: Leveraging Stepwise Error for Enhanced Mathematical Reasoning | Jun 30, 2024 | GSM8KMath | CodeCode Available | 1 |
| H-STAR: LLM-driven Hybrid SQL-Text Adaptive Reasoning on Tables | Jun 29, 2024 | Fact VerificationMathematical Reasoning | CodeCode Available | 1 |
| LLMs-as-Instructors: Learning from Errors Toward Automating Model Improvement | Jun 29, 2024 | Contrastive LearningMathematical Reasoning | —Unverified | 0 |
| LiteSearch: Efficacious Tree Search for LLM | Jun 29, 2024 | GSM8KMathematical Reasoning | —Unverified | 0 |
| The Qiyas Benchmark: Measuring ChatGPT Mathematical and Language Understanding in Arabic | Jun 28, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Applying RLAIF for Code Generation with API-usage in Lightweight LLMs | Jun 28, 2024 | Code GenerationHallucination | —Unverified | 0 |