| Can Stories Help LLMs Reason? Curating Information Space Through Narrative | Oct 25, 2024 | Math | —Unverified | 0 |
| ReasonAgain: Using Extractable Symbolic Programs to Evaluate Mathematical Reasoning | Oct 24, 2024 | GSM8KMath | —Unverified | 0 |
| From Blind Solvers to Logical Thinkers: Benchmarking LLMs' Logical Integrity on Faulty Mathematical Problems | Oct 24, 2024 | BenchmarkingCommon Sense Reasoning | —Unverified | 0 |
| Mixture of Parrots: Experts improve memorization more than reasoning | Oct 24, 2024 | MathMemorization | —Unverified | 0 |
| MiLoRA: Efficient Mixture of Low-Rank Adaptation for Large Language Models Fine-tuning | Oct 23, 2024 | MathMixture-of-Experts | —Unverified | 0 |
| Polyak's Heavy Ball Method Achieves Accelerated Local Rate of Convergence under Polyak-Lojasiewicz Inequality | Oct 22, 2024 | Math | —Unverified | 0 |
| JMMMU: A Japanese Massive Multi-discipline Multimodal Understanding Benchmark for Culture-aware Evaluation | Oct 22, 2024 | Math | —Unverified | 0 |
| Forewarned is Forearmed: Leveraging LLMs for Data Synthesis through Failure-Inducing Exploration | Oct 22, 2024 | Math | —Unverified | 0 |
| Optimizing Chain-of-Thought Reasoning: Tackling Arranging Bottleneck via Plan Augmentation | Oct 22, 2024 | GSM8KMath | —Unverified | 0 |
| No more hard prompts: SoftSRV prompting for synthetic data generation | Oct 21, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| PromptHive: Bringing Subject Matter Experts Back to the Forefront with Collaborative Prompt Engineering for Educational Content Creation | Oct 21, 2024 | MathPrompt Engineering | —Unverified | 0 |
| On Designing Effective RL Reward at Training Time for LLM Reasoning | Oct 19, 2024 | GSM8KMath | —Unverified | 0 |
| Do Large Language Models Truly Grasp Mathematics? An Empirical Exploration From Cognitive Psychology | Oct 19, 2024 | Logical ReasoningMath | —Unverified | 0 |
| LLM The Genius Paradox: A Linguistic and Math Expert's Struggle with Simple Word-based Counting Problems | Oct 18, 2024 | In-Context LearningMath | —Unverified | 0 |
| Step Guided Reasoning: Improving Mathematical Reasoning using Guidance Generation and Step Reasoning | Oct 18, 2024 | MathMathematical Reasoning | —Unverified | 0 |
| Bridging the Training-Inference Gap in LLMs by Leveraging Self-Generated Tokens | Oct 18, 2024 | MathQuestion Answering | —Unverified | 0 |
| SBI-RAG: Enhancing Math Word Problem Solving for Students through Schema-Based Instruction and Retrieval-Augmented Generation | Oct 17, 2024 | GSM8KLanguage Modeling | CodeCode Available | 0 |
| Not All Votes Count! Programs as Verifiers Improve Self-Consistency of Language Models for Math Reasoning | Oct 16, 2024 | AllGSM8K | CodeCode Available | 0 |
| When Not to Answer: Evaluating Prompts on GPT Models for Effective Abstention in Unanswerable Math Word Problems | Oct 16, 2024 | HallucinationMath | —Unverified | 0 |
| Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling | Oct 15, 2024 | Instruction FollowingKnowledge Distillation | —Unverified | 0 |
| MIND: Math Informed syNthetic Dialogues for Pretraining LLMs | Oct 15, 2024 | GSM8KMath | —Unverified | 0 |
| Innovative Thinking, Infinite Humor: Humor Research of Large Language Models through Structured Thought Leaps | Oct 14, 2024 | Math | —Unverified | 0 |
| One Language, Many Gaps: Evaluating Dialect Fairness and Robustness of Large Language Models in Reasoning Tasks | Oct 14, 2024 | FairnessGSM8K | CodeCode Available | 0 |
| Embedding Self-Correction as an Inherent Ability in Large Language Models for Enhanced Mathematical Reasoning | Oct 14, 2024 | MathMathematical Reasoning | —Unverified | 0 |
| Expanding Search Space with Diverse Prompting Agents: An Efficient Sampling Approach for LLM Mathematical Reasoning | Oct 13, 2024 | MathMathematical Reasoning | —Unverified | 0 |
| Dualformer: Controllable Fast and Slow Thinking by Learning with Randomized Reasoning Traces | Oct 13, 2024 | Computational EfficiencyMath | —Unverified | 0 |
| Testing GPT-4-o1-preview on math and science problems: A follow-up study | Oct 11, 2024 | MathSpatial Reasoning | —Unverified | 0 |
| Cognitive Noise and Altruistic Preferences | Oct 10, 2024 | Math | —Unverified | 0 |
| Teaching-Inspired Integrated Prompting Framework: A Novel Approach for Enhancing Reasoning in Large Language Models | Oct 10, 2024 | Arithmetic ReasoningMath | CodeCode Available | 0 |
| Herald: A Natural Language Annotated Lean 4 Dataset | Oct 9, 2024 | MathMathematical Reasoning | —Unverified | 0 |
| Subtle Errors Matter: Preference Learning via Error-injected Self-editing | Oct 9, 2024 | GSM8KMath | —Unverified | 0 |
| Hallucinating AI Hijacking Attack: Large Language Models and Malicious Code Recommenders | Oct 9, 2024 | Math | —Unverified | 0 |
| Give me a hint: Can LLMs take a hint to solve math problems? | Oct 8, 2024 | Adversarial RobustnessMath | CodeCode Available | 0 |
| FG-PRM: Fine-grained Hallucination Detection and Mitigation in Language Model Mathematical Reasoning | Oct 8, 2024 | GSM8KHallucination | —Unverified | 0 |
| Beyond Captioning: Task-Specific Prompting for Improved VLM Performance in Mathematical Reasoning | Oct 8, 2024 | Image RetrievalMath | —Unverified | 0 |
| Solving Functional Optimization with Deep Networks and Variational Principles | Oct 8, 2024 | Math | —Unverified | 0 |
| Intriguing Properties of Large Language and Vision Models | Oct 7, 2024 | cross-modal alignmentLarge Language Model | —Unverified | 0 |
| Rule-based Data Selection for Large Language Models | Oct 7, 2024 | BenchmarkingMath | —Unverified | 0 |
| Reasoning Paths Optimization: Learning to Reason and Explore From Diverse Paths | Oct 7, 2024 | AttributeGSM8K | —Unverified | 0 |
| fPLSA: Learning Semantic Structures in Document Collections Using Foundation Models | Oct 7, 2024 | Math | —Unverified | 0 |
| Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification | Oct 5, 2024 | GSM8KMath | —Unverified | 0 |
| BloomWise: Enhancing Problem-Solving capabilities of Large Language Models using Bloom's-Taxonomy-Inspired Prompts | Oct 5, 2024 | Math | —Unverified | 0 |
| Deliberate Reasoning for LLMs as Structure-aware Planning with Accurate World Model | Oct 4, 2024 | DiversityLogical Reasoning | —Unverified | 0 |
| Geometry is All You Need: A Unified Taxonomy of Matrix and Tensor Factorization for Compression of Generative Language Models | Oct 3, 2024 | AllLanguage Modeling | —Unverified | 0 |
| CodePMP: Scalable Preference Model Pretraining for Large Language Model Reasoning | Oct 3, 2024 | GSM8KLanguage Modeling | —Unverified | 0 |
| Towards the Pedagogical Steering of Large Language Models for Tutoring: A Case Study with Modeling Productive Failure | Oct 3, 2024 | Math | CodeCode Available | 0 |
| Llama SLayer 8B: Shallow Layers Hold the Key to Knowledge Injection | Oct 3, 2024 | Mathparameter-efficient fine-tuning | CodeCode Available | 0 |
| Adaptive Inference-Time Compute: LLMs Can Predict if They Can Do Better, Even Mid-Generation | Oct 3, 2024 | GSM8KMath | —Unverified | 0 |
| An Exploration of Self-Supervised Mutual Information Alignment for Multi-Task Settings | Oct 2, 2024 | 8kMath | CodeCode Available | 0 |
| Evaluating Robustness of Reward Models for Mathematical Reasoning | Oct 2, 2024 | MathMathematical Reasoning | —Unverified | 0 |