| Deconfounded Causality-aware Parameter-Efficient Fine-Tuning for Problem-Solving Improvement of LLMs | Sep 4, 2024 | Mathparameter-efficient fine-tuning | —Unverified | 0 |
| More is More: Addition Bias in Large Language Models | Sep 4, 2024 | MathText Summarization | CodeCode Available | 0 |
| General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model | Sep 3, 2024 | DecoderMath | CodeCode Available | 9 |
| S^3c-Math: Spontaneous Step-level Self-correction Makes Large Language Models Better Mathematical Reasoners | Sep 3, 2024 | GSM8KMath | —Unverified | 0 |
| MultiMath: Bridging Visual and Mathematical Reasoning for Large Language Models | Aug 30, 2024 | Image CaptioningLanguage Modeling | CodeCode Available | 1 |
| Logic Contrastive Reasoning with Lightweight Large Language Model for Math Word Problems | Aug 29, 2024 | GSM8KLanguage Modeling | —Unverified | 0 |
| Entropic Distribution Matching in Supervised Fine-tuning of LLMs: Less Overfitting and Better Diversity | Aug 29, 2024 | Code GenerationDiversity | —Unverified | 0 |
| Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems | Aug 29, 2024 | Math | —Unverified | 0 |
| Critic-CoT: Boosting the reasoning abilities of large language model via Chain-of-thoughts Critic | Aug 29, 2024 | GSM8KLanguage Modeling | —Unverified | 0 |
| SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models | Aug 28, 2024 | Data AugmentationGSM8K | —Unverified | 0 |
| What makes math problems hard for reinforcement learning: a case study | Aug 27, 2024 | MathReinforcement Learning (RL) | CodeCode Available | 1 |
| Generative Verifiers: Reward Modeling as Next-Token Prediction | Aug 27, 2024 | MathPrediction | —Unverified | 0 |
| Students' Perceived Roles, Opportunities, and Challenges of a Generative AI-powered Teachable Agent: A Case of Middle School Math Class | Aug 26, 2024 | Math | —Unverified | 0 |
| Multi-tool Integration Application for Math Reasoning Using Large Language Model | Aug 22, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| SORSA: Singular Values and Orthonormal Regularized Singular Vectors Adaptation of Large Language Models | Aug 21, 2024 | 8kGSM8K | CodeCode Available | 1 |
| Mathematical Information Retrieval: Search and Question Answering | Aug 21, 2024 | Information RetrievalMath | —Unverified | 0 |
| Benchmarking Large Language Models for Math Reasoning Tasks | Aug 20, 2024 | BenchmarkingIn-Context Learning | CodeCode Available | 0 |
| QPO: Query-dependent Prompt Optimization via Multi-Loop Offline Reinforcement Learning | Aug 20, 2024 | BenchmarkingLanguage Modelling | —Unverified | 0 |
| A Study of PHOC Spatial Region Configurations for Math Formula Retrieval | Aug 17, 2024 | MathRetrieval | —Unverified | 0 |
| Large Language Models Might Not Care What You Are Saying: Prompt Format Beats Descriptions | Aug 16, 2024 | DescriptiveHallucination | —Unverified | 0 |
| Math-PUMA: Progressive Upward Multimodal Alignment to Enhance Mathematical Reasoning | Aug 16, 2024 | MathMathematical Reasoning | CodeCode Available | 1 |
| Does Reasoning Emerge? Examining the Probabilities of Causation in Large Language Models | Aug 15, 2024 | Math | —Unverified | 0 |
| Leveraging Web-Crawled Data for High-Quality Fine-Tuning | Aug 15, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| Bridging and Modeling Correlations in Pairwise Data for Direct Preference Optimization | Aug 14, 2024 | InformativenessInstruction Following | CodeCode Available | 1 |
| MathScape: Evaluating MLLMs in multimodal Math Scenarios through a Hierarchical Benchmark | Aug 14, 2024 | MathMathematical Reasoning | CodeCode Available | 0 |
| A Perspective on Large Language Models, Intelligent Machines, and Knowledge Acquisition | Aug 13, 2024 | Common Sense ReasoningMath | —Unverified | 0 |
| Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers | Aug 12, 2024 | GSM8KMath | CodeCode Available | 4 |
| P3: A Policy-Driven, Pace-Adaptive, and Diversity-Promoted Framework for data pruning in LLM Training | Aug 10, 2024 | DiversityLogical Reasoning | —Unverified | 0 |
| Examining the Behavior of LLM Architectures Within the Framework of Standardized National Exams in Brazil | Aug 9, 2024 | MathMultiple-choice | —Unverified | 0 |
| Mathfish: Evaluating Language Model Math Reasoning via Grounding in Educational Curricula | Aug 8, 2024 | GSM8KLanguage Modeling | CodeCode Available | 1 |
| AltCanvas: A Tile-Based Image Editor with Generative AI for Blind or Visually Impaired People | Aug 5, 2024 | Math | —Unverified | 0 |
| The Logic of Political Survival Revisited: Consequences of Elite Uncertainty Under Authoritarian Rule | Aug 4, 2024 | Math | —Unverified | 0 |
| On the Resilience of LLM-Based Multi-Agent Collaboration with Faulty Agents | Aug 2, 2024 | Code GenerationLarge Language Model | CodeCode Available | 1 |
| MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities | Aug 1, 2024 | MathMM-Vet | CodeCode Available | 3 |
| Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models | Aug 1, 2024 | Math | CodeCode Available | 2 |
| Large Language Monkeys: Scaling Inference Compute with Repeated Sampling | Jul 31, 2024 | GSM8KMath | CodeCode Available | 3 |
| AI-Assisted Generation of Difficult Math Questions | Jul 30, 2024 | MathMathematical Reasoning | CodeCode Available | 0 |
| Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process | Jul 29, 2024 | GSM8KMath | CodeCode Available | 2 |
| Towards Effective and Efficient Continual Pre-training of Large Language Models | Jul 26, 2024 | Math | CodeCode Available | 0 |
| Recursive Introspection: Teaching Language Model Agents How to Self-Improve | Jul 25, 2024 | Imitation LearningLanguage Modeling | —Unverified | 0 |
| Boosting Large Language Models with Socratic Method for Conversational Mathematics Teaching | Jul 24, 2024 | Math | CodeCode Available | 1 |
| MathViz-E: A Case-study in Domain-Specialized Tool-Using Agents | Jul 24, 2024 | Math | CodeCode Available | 1 |
| LEAN-GitHub: Compiling GitHub LEAN repositories for a versatile LEAN prover | Jul 24, 2024 | Automated Theorem ProvingMath | CodeCode Available | 4 |
| Nerva: a Truly Sparse Implementation of Neural Networks | Jul 24, 2024 | Math | CodeCode Available | 1 |
| TaskGen: A Task-Based, Memory-Infused Agentic Framework using StrictJSON | Jul 22, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| Toward Adaptive Reasoning in Large Language Models with Thought Rollback | Jul 21, 2024 | Arithmetic ReasoningMath | CodeCode Available | 1 |
| Generalization v.s. Memorization: Tracing Language Models' Capabilities Back to Pretraining Data | Jul 20, 2024 | Language ModellingMachine Translation | —Unverified | 0 |
| Learning Goal-Conditioned Representations for Language Reward Models | Jul 18, 2024 | GSM8KMath | CodeCode Available | 1 |
| Weak-to-Strong Reasoning | Jul 18, 2024 | GSM8KMath | CodeCode Available | 2 |
| Prover-Verifier Games improve legibility of LLM outputs | Jul 18, 2024 | Math | CodeCode Available | 0 |