| LEVER: Learning to Verify Language-to-Code Generation with Execution | Feb 16, 2023 | Arithmetic ReasoningCode Generation | CodeCode Available | 1 |
| Large Language Models Can Be Easily Distracted by Irrelevant Context | Jan 31, 2023 | Arithmetic ReasoningLanguage Modeling | CodeCode Available | 1 |
| Batch Prompting: Efficient Inference with Large Language Model APIs | Jan 19, 2023 | Arithmetic ReasoningIn-Context Learning | CodeCode Available | 1 |
| Large Language Models are Better Reasoners with Self-Verification | Dec 19, 2022 | Arithmetic ReasoningCommon Sense Reasoning | CodeCode Available | 1 |
| Solving Math Word Problems via Cooperative Reasoning induced Language Models | Oct 28, 2022 | Arithmetic ReasoningMath | CodeCode Available | 1 |
| OpenCQA: Open-ended Question Answering with Charts | Oct 12, 2022 | Arithmetic ReasoningDescriptive | CodeCode Available | 1 |
| Learning Math Reasoning from Self-Sampled Correct and Partially-Correct Solutions | May 28, 2022 | Arithmetic ReasoningEfficient Exploration | CodeCode Available | 1 |
| UL2: Unifying Language Learning Paradigms | May 10, 2022 | Arithmetic ReasoningCommon Sense Reasoning | CodeCode Available | 1 |
| Self-Consistency Improves Chain of Thought Reasoning in Language Models | Mar 21, 2022 | ARCArithmetic Reasoning | CodeCode Available | 1 |
| IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning | Oct 25, 2021 | Arithmetic ReasoningMathematical Question Answering | CodeCode Available | 1 |
| Inter-GPS: Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning | May 10, 2021 | Arithmetic ReasoningGeometry Problem Solving | CodeCode Available | 1 |
| Learning to Reason for Text Generation from Scientific Tables | Apr 16, 2021 | Arithmetic ReasoningArticles | CodeCode Available | 1 |
| DCR: Quantifying Data Contamination in LLMs Evaluation | Jul 15, 2025 | Arithmetic ReasoningBenchmarking | CodeCode Available | 0 |
| DS@GT at CheckThat! 2025: Evaluating Context and Tokenization Strategies for Numerical Fact Verification | Jul 8, 2025 | ARCArithmetic Reasoning | CodeCode Available | 0 |
| FinLMM-R1: Enhancing Financial Reasoning in LMM through Scalable Data and Reward Design | Jun 16, 2025 | Answer GenerationArithmetic Reasoning | —Unverified | 0 |
| Mathematical Reasoning for Unmanned Aerial Vehicles: A RAG-Based Approach for Complex Arithmetic Reasoning | Jun 5, 2025 | Arithmetic ReasoningMath | CodeCode Available | 0 |
| Learning-at-Criticality in Large Language Models for Quantum Field Theory and Beyond | Jun 4, 2025 | Arithmetic ReasoningReinforcement Learning (RL) | —Unverified | 0 |
| DiaBlo: Diagonal Blocks Are Sufficient For Finetuning | Jun 3, 2025 | Arithmetic ReasoningCode Generation | CodeCode Available | 0 |
| VisualSphinx: Large-Scale Synthetic Vision Logic Puzzles for RL | May 29, 2025 | Arithmetic ReasoningImage Generation | —Unverified | 0 |
| Joint Flashback Adaptation for Forgetting-Resistant Instruction Tuning | May 21, 2025 | Arithmetic ReasoningInstruction Following | —Unverified | 0 |
| Tokenization Constraints in LLMs: A Study of Symbolic and Arithmetic Reasoning Limits | May 20, 2025 | Arithmetic Reasoning | —Unverified | 0 |
| OMAC: A Broad Optimization Framework for LLM-Based Multi-Agent Collaboration | May 17, 2025 | Arithmetic ReasoningCode Generation | CodeCode Available | 0 |
| Fact-Consistency Evaluation of Text-to-SQL Generation for Business Intelligence Using Exaone 3.5 | Apr 30, 2025 | Arithmetic ReasoningText to SQL | —Unverified | 0 |
| ThoughtProbe: Classifier-Guided Thought Space Exploration Leveraging LLM Intrinsic Reasoning | Apr 9, 2025 | Arithmetic Reasoningvalid | —Unverified | 0 |
| Your Language Model May Think Too Rigidly: Achieving Reasoning Consistency with Symmetry-Enhanced Training | Feb 25, 2025 | Arithmetic ReasoningData Augmentation | —Unverified | 0 |