| Multi-tool Integration Application for Math Reasoning Using Large Language Model | Aug 22, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Taming Generative Diffusion Prior for Universal Blind Image Restoration | Aug 21, 2024 | Image RestorationMathematical Reasoning | —Unverified | 0 |
| SarcasmBench: Towards Evaluating Large Language Models on Sarcasm Understanding | Aug 21, 2024 | Logical ReasoningMathematical Reasoning | —Unverified | 0 |
| Benchmarking Large Language Models for Math Reasoning Tasks | Aug 20, 2024 | BenchmarkingIn-Context Learning | CodeCode Available | 0 |
| Concept Distillation from Strong to Weak Models via Hypotheses-to-Theories Prompting | Aug 18, 2024 | HumanEvalMathematical Reasoning | —Unverified | 0 |
| MathScape: Evaluating MLLMs in multimodal Math Scenarios through a Hierarchical Benchmark | Aug 14, 2024 | MathMathematical Reasoning | CodeCode Available | 0 |
| MAQA: Evaluating Uncertainty Quantification in LLMs Regarding Data Uncertainty | Aug 13, 2024 | Mathematical ReasoningQuestion Answering | CodeCode Available | 0 |
| MathLearner: A Large Language Model Agent Framework for Learning to Solve Mathematical Problems | Aug 3, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| AI-Assisted Generation of Difficult Math Questions | Jul 30, 2024 | MathMathematical Reasoning | CodeCode Available | 0 |
| Optimizing Numerical Estimation and Operational Efficiency in the Legal Domain through Large Language Models | Jul 26, 2024 | Mathematical Reasoning | —Unverified | 0 |
| Reliable Reasoning Beyond Natural Language | Jul 16, 2024 | GSM8KMathematical Reasoning | —Unverified | 0 |
| A Comprehensive Evaluation of Large Language Models on Temporal Event Forecasting | Jul 16, 2024 | Mathematical ReasoningQuestion Answering | —Unverified | 0 |
| Fine-Tuning and Prompt Optimization: Two Great Steps that Work Better Together | Jul 15, 2024 | Arithmetic ReasoningLanguage Modeling | —Unverified | 0 |
| Key-Point-Driven Mathematical Reasoning Distillation of Large Language Model | Jul 14, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Token-Supervised Value Models for Enhancing Mathematical Reasoning Capabilities of Large Language Models | Jul 12, 2024 | GSM8KMath | —Unverified | 0 |
| Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist | Jul 11, 2024 | GSM8KMath | —Unverified | 0 |
| Skywork-Math: Data Scaling Laws for Mathematical Reasoning in Large Language Models -- The Story Goes On | Jul 11, 2024 | GSM8KMath | —Unverified | 0 |
| Progress or Regress? Self-Improvement Reversal in Post-training | Jul 6, 2024 | DiversityMathematical Reasoning | —Unverified | 0 |
| Smart Vision-Language Reasoners | Jul 5, 2024 | MathMathematical Reasoning | CodeCode Available | 0 |
| How Does Quantization Affect Multilingual LLMs? | Jul 3, 2024 | Mathematical ReasoningQuantization | —Unverified | 0 |
| Integrate the Essence and Eliminate the Dross: Fine-Grained Self-Consistency for Free-Form Language Generation | Jul 2, 2024 | Code GenerationForm | CodeCode Available | 0 |
| FRoG: Evaluating Fuzzy Reasoning of Generalized Quantifiers in Large Language Models | Jul 1, 2024 | Mathematical Reasoning | CodeCode Available | 0 |
| LiteSearch: Efficacious Tree Search for LLM | Jun 29, 2024 | GSM8KMathematical Reasoning | —Unverified | 0 |
| LLMs-as-Instructors: Learning from Errors Toward Automating Model Improvement | Jun 29, 2024 | Contrastive LearningMathematical Reasoning | —Unverified | 0 |
| The Qiyas Benchmark: Measuring ChatGPT Mathematical and Language Understanding in Arabic | Jun 28, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |