| PECC: Problem Extraction and Coding Challenges | Apr 29, 2024 | Code GenerationMath | CodeCode Available | 1 |
| AI Coders Are Among Us: Rethinking Programming Language Grammar Towards Efficient Code Generation | Apr 25, 2024 | Code GenerationMath | CodeCode Available | 1 |
| Achieving >97% on GSM8K: Deeply Understanding the Problems Makes LLMs Better Solvers for Math Word Problems | Apr 23, 2024 | Arithmetic ReasoningGSM8K | CodeCode Available | 1 |
| Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing | Apr 18, 2024 | Arithmetic ReasoningGSM8K | CodeCode Available | 1 |
| Benchmarking Large Language Models for Persian: A Preliminary Study Focusing on ChatGPT | Apr 3, 2024 | BenchmarkingGeneral Knowledge | CodeCode Available | 1 |
| What is in Your Safe Data? Identifying Benign Data that Breaks Safety | Apr 1, 2024 | Math | CodeCode Available | 1 |
| Don't Trust: Verify -- Grounding LLM Quantitative Reasoning with Autoformalization | Mar 26, 2024 | Automated Theorem ProvingGSM8K | CodeCode Available | 1 |
| Memory-Efficient and Secure DNN Inference on TrustZone-enabled Consumer IoT Devices | Mar 19, 2024 | Math | CodeCode Available | 1 |
| The First to Know: How Token Distributions Reveal Hidden Knowledge in Large Vision-Language Models? | Mar 14, 2024 | Hallucinationimage-classification | CodeCode Available | 1 |
| Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language Models | Mar 4, 2024 | Data AugmentationGSM8K | CodeCode Available | 1 |
| Brilla AI: AI Contestant for the National Science and Maths Quiz | Mar 4, 2024 | MathQuestion Answering | CodeCode Available | 1 |
| Improving the Validity of Automatically Generated Feedback via Reinforcement Learning | Mar 2, 2024 | MathMisconceptions | CodeCode Available | 1 |
| Case-Based or Rule-Based: How Do Transformers Do the Math? | Feb 27, 2024 | MathSystematic Generalization | CodeCode Available | 1 |
| Stepwise Self-Consistent Mathematical Reasoning with Large Language Models | Feb 24, 2024 | MathMathematical Reasoning | CodeCode Available | 1 |
| MATHWELL: Generating Educational Math Word Problems Using Teacher Annotations | Feb 24, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| ConceptMath: A Bilingual Concept-wise Benchmark for Measuring Mathematical Reasoning of Large Language Models | Feb 22, 2024 | MathMathematical Reasoning | CodeCode Available | 1 |
| Language Models as Science Tutors | Feb 16, 2024 | GSM8KMath | CodeCode Available | 1 |
| GeoEval: Benchmark for Evaluating LLMs and Multi-Modal Models on Geometry Problem-Solving | Feb 15, 2024 | Geometry Problem SolvingMath | CodeCode Available | 1 |
| MUSTARD: Mastering Uniform Synthesis of Theorem and Proof Data | Feb 14, 2024 | Automated Theorem ProvingLanguage Modelling | CodeCode Available | 1 |
| Understanding Reasoning Ability of Language Models From the Perspective of Reasoning Paths Aggregation | Feb 5, 2024 | Knowledge GraphsMath | CodeCode Available | 1 |
| MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models | Feb 2, 2024 | Language ModellingLarge Language Model | CodeCode Available | 1 |
| ReGAL: Refactoring Programs to Discover Generalizable Abstractions | Jan 29, 2024 | Date UnderstandingMath | CodeCode Available | 1 |
| TroVE: Inducing Verifiable and Efficient Toolboxes for Solving Programmatic Tasks | Jan 23, 2024 | MathQuestion Answering | CodeCode Available | 1 |
| Over-Reasoning and Redundant Calculation of Large Language Models | Jan 21, 2024 | GSM8KMath | CodeCode Available | 1 |
| Escape Sky-high Cost: Early-stopping Self-Consistency for Multi-step Reasoning | Jan 19, 2024 | GSM8KMath | CodeCode Available | 1 |