| Advancing Process Verification for Large Language Models via Tree-Based Preference Learning | Jun 29, 2024 | Binary ClassificationGSM8K | —Unverified | 0 |
| CMMaTH: A Chinese Multi-modal Math Skill Evaluation Benchmark for Foundation Models | Jun 28, 2024 | DiversityMath | —Unverified | 0 |
| ScaleBiO: Scalable Bilevel Optimization for LLM Data Reweighting | Jun 28, 2024 | Bilevel OptimizationInstruction Following | —Unverified | 0 |
| DiVERT: Distractor Generation with Variational Errors Represented as Text for Math Multiple-choice Questions | Jun 27, 2024 | Distractor GenerationMath | CodeCode Available | 0 |
| Task Oriented In-Domain Data Augmentation | Jun 24, 2024 | Data AugmentationMath | —Unverified | 0 |
| Generative AI for Enhancing Active Learning in Education: A Comparative Study of GPT-3.5 and GPT-4 in Crafting Customized Test Questions | Jun 20, 2024 | Active LearningMath | —Unverified | 0 |
| Towards Infinite-Long Prefix in Transformer | Jun 20, 2024 | Mathparameter-efficient fine-tuning | CodeCode Available | 0 |
| Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning | Jun 20, 2024 | GSM8KHeuristic Search | —Unverified | 0 |
| Can LLMs Reason in the Wild with Programs? | Jun 19, 2024 | GSM8KMath | CodeCode Available | 0 |
| Knowledge Tagging System on Math Questions via LLMs with Flexible Demonstration Retriever | Jun 19, 2024 | MathSemantic Similarity | —Unverified | 0 |