| Evaluating Robustness of Reward Models for Mathematical Reasoning | Oct 2, 2024 | MathMathematical Reasoning | —Unverified | 0 |
| Not All LLM Reasoners Are Created Equal | Oct 2, 2024 | AllCode Generation | —Unverified | 0 |
| Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models | Oct 2, 2024 | Cross-Lingual TransferMath | —Unverified | 0 |
| Mind Scramble: Unveiling Large Language Model Psychology Via Typoglycemia | Oct 2, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment | Oct 2, 2024 | GSM8KMath | CodeCode Available | 2 |
| Automated Knowledge Concept Annotation and Question Representation Learning for Knowledge Tracing | Oct 2, 2024 | Contrastive LearningKnowledge Tracing | CodeCode Available | 0 |
| OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data | Oct 2, 2024 | Arithmetic ReasoningLarge Language Model | CodeCode Available | 4 |
| LASeR: Learning to Adaptively Select Reward Models with Multi-Armed Bandits | Oct 2, 2024 | Instruction FollowingMath | CodeCode Available | 1 |
| Scheherazade: Evaluating Chain-of-Thought Math Reasoning in LLMs with Chain-of-Problems | Sep 30, 2024 | GSM8KMath | CodeCode Available | 0 |
| Instance-adaptive Zero-shot Chain-of-Thought Prompting | Sep 30, 2024 | GSM8KMath | —Unverified | 0 |
| The Perfect Blend: Redefining RLHF with Mixture of Judges | Sep 30, 2024 | Instruction FollowingMath | —Unverified | 0 |
| INC-Math: Integrating Natural Language and Code for Enhanced Mathematical Reasoning in Large Language Models | Sep 28, 2024 | MathMathematical Reasoning | —Unverified | 0 |
| Revisiting the Superficial Alignment Hypothesis | Sep 27, 2024 | Instruction FollowingMath | —Unverified | 0 |
| On the Inductive Bias of Stacking Towards Improving Reasoning | Sep 27, 2024 | Inductive BiasMath | —Unverified | 0 |
| BEATS: Optimizing LLM Mathematical Capabilities with BackVerify and Adaptive Disambiguate based Efficient Tree Search | Sep 26, 2024 | MathMathematical Problem-Solving | CodeCode Available | 1 |
| Learning to Love Edge Cases in Formative Math Assessment: Using the AMMORE Dataset and Chain-of-Thought Prompting to Improve Grading Accuracy | Sep 26, 2024 | Knowledge TracingMath | —Unverified | 0 |
| Democratizing Signal Processing and Machine Learning: Math Learning Equity for Elementary and Middle School Students | Sep 25, 2024 | Math | —Unverified | 0 |
| PMSS: Pretrained Matrices Skeleton Selection for LLM Fine-tuning | Sep 25, 2024 | GSM8KMath | —Unverified | 0 |
| LLaMa-SciQ: An Educational Chatbot for Answering Science MCQ | Sep 25, 2024 | ChatbotGSM8K | —Unverified | 0 |
| Models Can and Should Embrace the Communicative Nature of Human-Generated Math | Sep 25, 2024 | Math | —Unverified | 0 |
| Archon: An Architecture Search Framework for Inference-Time Techniques | Sep 23, 2024 | Hyperparameter OptimizationInstruction Following | CodeCode Available | 2 |
| PTD-SQL: Partitioning and Targeted Drilling with LLMs in Text-to-SQL | Sep 21, 2024 | MathText to SQL | CodeCode Available | 0 |
| Beyond Accuracy Optimization: Computer Vision Losses for Large Language Model Fine-Tuning | Sep 20, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| ControlMath: Controllable Data Generation Promotes Math Generalist Models | Sep 20, 2024 | Data AugmentationDiversity | —Unverified | 0 |
| Balancing LoRA Performance and Efficiency with Simple Shard Sharing | Sep 19, 2024 | Computational EfficiencyGSM8K | CodeCode Available | 2 |