| Automate Knowledge Concept Tagging on Math Questions with LLMs | Mar 26, 2024 | Few-Shot LearningMath | —Unverified | 0 |
| To Err is Machine: Vulnerability Detection Challenges LLM Reasoning | Mar 25, 2024 | Code GenerationIn-Context Learning | —Unverified | 0 |
| MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems? | Mar 21, 2024 | MathMathematical Reasoning | —Unverified | 0 |
| A Chain-of-Thought Prompting Approach with LLMs for Evaluating Students' Formative Assessment Responses in Science | Mar 21, 2024 | Active LearningMath | —Unverified | 0 |
| From Large to Tiny: Distilling and Refining Mathematical Expertise for Math Word Problems with Weakly Supervision | Mar 21, 2024 | Math | —Unverified | 0 |
| PARAMANU-AYN: Pretrain from scratch or Continual Pretraining of LLMs for Legal Domain Adaptation? | Mar 20, 2024 | Abstractive Text SummarizationContinual Pretraining | —Unverified | 0 |
| Evolutionary Optimization of Model Merging Recipes | Mar 19, 2024 | Evolutionary AlgorithmsMath | CodeCode Available | 5 |
| Memory-Efficient and Secure DNN Inference on TrustZone-enabled Consumer IoT Devices | Mar 19, 2024 | Math | CodeCode Available | 1 |
| Instructing Large Language Models to Identify and Ignore Irrelevant Conditions | Mar 19, 2024 | MathMathematical Reasoning | CodeCode Available | 0 |
| What Makes Math Word Problems Challenging for LLMs? | Mar 17, 2024 | Math | CodeCode Available | 0 |
| An upper bound of the mutation probability in the genetic algorithm for general 0-1 knapsack problem | Mar 17, 2024 | DiversityEvolutionary Algorithms | —Unverified | 0 |
| Incorporating Graph Attention Mechanism into Geometric Problem Solving Based on Deep Reinforcement Learning | Mar 14, 2024 | Deep Reinforcement LearningGraph Attention | CodeCode Available | 0 |
| Hydrodynamics of Markets:Hidden Links Between Physics and Finance | Mar 14, 2024 | Math | —Unverified | 0 |
| Self-Consistency Boosts Calibration for Math Reasoning | Mar 14, 2024 | GSM8KMath | —Unverified | 0 |
| Sabiá-2: A New Generation of Portuguese Large Language Models | Mar 14, 2024 | Math | —Unverified | 0 |
| Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision | Mar 14, 2024 | MathReinforcement Learning (RL) | CodeCode Available | 2 |
| The First to Know: How Token Distributions Reveal Hidden Knowledge in Large Vision-Language Models? | Mar 14, 2024 | Hallucinationimage-classification | CodeCode Available | 1 |
| Laying the Foundation First? Investigating the Generalization from Atomic Skills to Complex Reasoning Tasks | Mar 14, 2024 | MathSkill Generalization | —Unverified | 0 |
| Mastering Text, Code and Math Simultaneously via Fusing Highly Specialized Language Models | Mar 13, 2024 | Math | —Unverified | 0 |
| FineMath: A Fine-Grained Mathematical Evaluation Benchmark for Chinese Large Language Models | Mar 12, 2024 | MathMathematical Reasoning | —Unverified | 0 |
| SmallToLarge (S2L): Scalable Data Selection for Fine-tuning Large Language Models by Summarizing Training Trajectories of Small Models | Mar 12, 2024 | MathMathematical Problem-Solving | CodeCode Available | 0 |
| Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM | Mar 12, 2024 | Arithmetic ReasoningCode Generation | —Unverified | 0 |
| Common 7B Language Models Already Possess Strong Math Capabilities | Mar 7, 2024 | GSM8KMath | CodeCode Available | 5 |
| Benchmarking Hallucination in Large Language Models based on Unanswerable Math Word Problem | Mar 6, 2024 | BenchmarkingHallucination | CodeCode Available | 0 |
| MathScale: Scaling Instruction Tuning for Mathematical Reasoning | Mar 5, 2024 | GSM8KMath | CodeCode Available | 0 |
| Evaluating and Optimizing Educational Content with Large Language Model Judgments | Mar 5, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| Experimenting with Generative AI: Does ChatGPT Really Increase Everyone's Productivity? | Mar 4, 2024 | EconometricsMath | —Unverified | 0 |
| The Claude 3 Model Family: Opus, Sonnet, Haiku | Mar 4, 2024 | 1 Image, 2*2 StitchingArithmetic Reasoning | —Unverified | 0 |
| Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training | Mar 4, 2024 | MathPhrase Grounding | —Unverified | 0 |
| Brilla AI: AI Contestant for the National Science and Maths Quiz | Mar 4, 2024 | MathQuestion Answering | CodeCode Available | 1 |
| Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language Models | Mar 4, 2024 | Data AugmentationGSM8K | CodeCode Available | 1 |
| Key-Point-Driven Data Synthesis with its Enhancement on Mathematical Reasoning | Mar 4, 2024 | GSM8KMath | —Unverified | 0 |
| Improving the Validity of Automatically Generated Feedback via Reinforcement Learning | Mar 2, 2024 | MathMisconceptions | CodeCode Available | 1 |
| ClickTree: A Tree-based Method for Predicting Math Students' Performance Based on Clickstream Data | Mar 1, 2024 | Math | —Unverified | 0 |
| Functional Benchmarks for Robust Evaluation of Reasoning Performance, and the Reasoning Gap | Feb 29, 2024 | Math | CodeCode Available | 2 |
| PRSA: Prompt Stealing Attacks against Real-World Prompt Services | Feb 29, 2024 | Math | —Unverified | 0 |
| GSM-Plus: A Comprehensive Benchmark for Evaluating the Robustness of LLMs as Mathematical Problem Solvers | Feb 29, 2024 | GSM8KMath | CodeCode Available | 2 |
| StarCoder 2 and The Stack v2: The Next Generation | Feb 29, 2024 | Code CompletionCode Generation | CodeCode Available | 7 |
| Data Interpreter: An LLM Agent For Data Science | Feb 28, 2024 | Code GenerationLanguage Modelling | —Unverified | 0 |
| Adversarial Math Word Problem Generation | Feb 27, 2024 | Math | CodeCode Available | 0 |
| MATHSENSEI: A Tool-Augmented Large Language Model for Mathematical Reasoning | Feb 27, 2024 | 8kLanguage Modeling | CodeCode Available | 0 |
| Case-Based or Rule-Based: How Do Transformers Do the Math? | Feb 27, 2024 | MathSystematic Generalization | CodeCode Available | 1 |
| MathGenie: Generating Synthetic Data with Question Back-translation for Enhancing Mathematical Reasoning of LLMs | Feb 26, 2024 | GSM8KMath | —Unverified | 0 |
| Stepwise Self-Consistent Mathematical Reasoning with Large Language Models | Feb 24, 2024 | MathMathematical Reasoning | CodeCode Available | 1 |
| How Do Humans Write Code? Large Models Do It the Same Way Too | Feb 24, 2024 | Code GenerationMath | CodeCode Available | 0 |
| MATHWELL: Generating Educational Math Word Problems Using Teacher Annotations | Feb 24, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Brain-Inspired Two-Stage Approach: Enhancing Mathematical Reasoning by Imitating Human Thought Processes | Feb 23, 2024 | MathMathematical Reasoning | CodeCode Available | 0 |
| ConceptMath: A Bilingual Concept-wise Benchmark for Measuring Mathematical Reasoning of Large Language Models | Feb 22, 2024 | MathMathematical Reasoning | CodeCode Available | 1 |
| Measuring Multimodal Mathematical Reasoning with MATH-Vision Dataset | Feb 22, 2024 | DiversityMath | CodeCode Available | 2 |
| MoELoRA: Contrastive Learning Guided Mixture of Experts on Parameter-Efficient Fine-Tuning for Large Language Models | Feb 20, 2024 | Common Sense ReasoningContrastive Learning | —Unverified | 0 |