| MM-MATH: Advancing Multimodal Math Evaluation with Process Evaluation and Fine-grained Classification | Apr 7, 2024 | Image ComprehensionMath | CodeCode Available | 0 |
| Data Augmentation with In-Context Learning and Comparative Evaluation in Math Word Problem Solving | Apr 5, 2024 | Data AugmentationIn-Context Learning | —Unverified | 0 |
| HyperCLOVA X Technical Report | Apr 2, 2024 | Instruction FollowingMachine Translation | —Unverified | 0 |
| Exploring Automated Distractor Generation for Math Multiple-choice Questions via Large Language Models | Apr 2, 2024 | Distractor GenerationIn-Context Learning | CodeCode Available | 0 |
| LM^2: A Simple Society of Language Models Solves Complex Reasoning | Apr 2, 2024 | MathMedQA | CodeCode Available | 0 |
| IsoBench: Benchmarking Multimodal Foundation Models on Isomorphic Representations | Apr 1, 2024 | BenchmarkingMath | —Unverified | 0 |
| Exploring the Mystery of Influential Data for Mathematical Reasoning | Apr 1, 2024 | MathMathematical Reasoning | —Unverified | 0 |
| Stable Code Technical Report | Apr 1, 2024 | Code CompletionLanguage Modelling | —Unverified | 0 |
| Self-Demos: Eliciting Out-of-Demonstration Generalizability in Large Language Models | Apr 1, 2024 | In-Context LearningMath | CodeCode Available | 0 |
| Can LLMs Master Math? Investigating Large Language Models on Math Stack Exchange | Mar 30, 2024 | MathMathematical Problem-Solving | CodeCode Available | 0 |