| ConvNLP: Image-based AI Text Detection | Jul 9, 2024 | Domain GeneralizationMath | —Unverified | 0 |
| Who is better at math, Jenny or Jingzhen? Uncovering Stereotypes in Large Language Models | Jul 9, 2024 | Math | CodeCode Available | 0 |
| Solving for X and Beyond: Can Large Language Models Solve Complex Math Problems with More-Than-Two Unknowns? | Jul 6, 2024 | Math | CodeCode Available | 0 |
| Smart Vision-Language Reasoners | Jul 5, 2024 | MathMathematical Reasoning | CodeCode Available | 0 |
| Helpful assistant or fruitful facilitator? Investigating how personas affect language model behavior | Jul 2, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| Advancing Process Verification for Large Language Models via Tree-Based Preference Learning | Jun 29, 2024 | Binary ClassificationGSM8K | —Unverified | 0 |
| CMMaTH: A Chinese Multi-modal Math Skill Evaluation Benchmark for Foundation Models | Jun 28, 2024 | DiversityMath | —Unverified | 0 |
| ScaleBiO: Scalable Bilevel Optimization for LLM Data Reweighting | Jun 28, 2024 | Bilevel OptimizationInstruction Following | —Unverified | 0 |
| DiVERT: Distractor Generation with Variational Errors Represented as Text for Math Multiple-choice Questions | Jun 27, 2024 | Distractor GenerationMath | CodeCode Available | 0 |
| Task Oriented In-Domain Data Augmentation | Jun 24, 2024 | Data AugmentationMath | —Unverified | 0 |
| Generative AI for Enhancing Active Learning in Education: A Comparative Study of GPT-3.5 and GPT-4 in Crafting Customized Test Questions | Jun 20, 2024 | Active LearningMath | —Unverified | 0 |
| Towards Infinite-Long Prefix in Transformer | Jun 20, 2024 | Mathparameter-efficient fine-tuning | CodeCode Available | 0 |
| Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning | Jun 20, 2024 | GSM8KHeuristic Search | —Unverified | 0 |
| Can LLMs Reason in the Wild with Programs? | Jun 19, 2024 | GSM8KMath | CodeCode Available | 0 |
| Knowledge Tagging System on Math Questions via LLMs with Flexible Demonstration Retriever | Jun 19, 2024 | MathSemantic Similarity | —Unverified | 0 |
| Navigating the Labyrinth: Evaluating and Enhancing LLMs' Ability to Reason About Search Problems | Jun 18, 2024 | In-Context LearningMath | —Unverified | 0 |
| GeoGPT4V: Towards Geometric Multi-modal Large Language Models with Geometric Image Generation | Jun 17, 2024 | Image GenerationMath | CodeCode Available | 0 |
| Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts | Jun 17, 2024 | Math | —Unverified | 0 |
| Program Synthesis Benchmark for Visual Programming in XLogoOnline Environment | Jun 17, 2024 | Logical ReasoningMath | —Unverified | 0 |
| Exposing the Achilles' Heel: Evaluating LLMs Ability to Handle Mistakes in Mathematical Reasoning | Jun 16, 2024 | BenchmarkingMath | —Unverified | 0 |
| ReMI: A Dataset for Reasoning with Multiple Images | Jun 13, 2024 | Chart UnderstandingMath | —Unverified | 0 |
| CLST: Cold-Start Mitigation in Knowledge Tracing by Aligning a Generative Language Model as a Students' Knowledge Tracer | Jun 13, 2024 | Domain GeneralizationKnowledge Tracing | —Unverified | 0 |
| Can I understand what I create? Self-Knowledge Evaluation of Large Language Models | Jun 10, 2024 | Math | —Unverified | 0 |
| Human Learning about AI | Jun 8, 2024 | Math | —Unverified | 0 |
| A multi-core periphery perspective: Ranking via relative centrality | Jun 6, 2024 | Math | —Unverified | 0 |