| Concise and Organized Perception Facilitates Reasoning in Large Language Models | Oct 5, 2023 | LAMBADAMath | —Unverified | 0 |
| Retrieval-augmented Generation to Improve Math Question-Answering: Trade-offs Between Groundedness and Human Preference | Oct 4, 2023 | MathQuestion Answering | CodeCode Available | 1 |
| The Rise of Open Science: Tracking the Evolution and Perceived Value of Data and Methods Link-Sharing Practices | Oct 4, 2023 | ArticlesMath | CodeCode Available | 0 |
| Novice Learner and Expert Tutor: Evaluating Math Reasoning Abilities of Large Language Models with Misconceptions | Oct 3, 2023 | MathMathematical Reasoning | —Unverified | 0 |
| Instances Need More Care: Rewriting Prompts for Instances with LLMs in the Loop Yields Better Zero-Shot Performance | Oct 3, 2023 | Code GenerationLogical Reasoning | CodeCode Available | 0 |
| Large Language Models as Analogical Reasoners | Oct 3, 2023 | Code GenerationGSM8K | —Unverified | 0 |
| Benchmarking and Improving Generator-Validator Consistency of Language Models | Oct 3, 2023 | BenchmarkingInstruction Following | —Unverified | 0 |
| SNIP: Bridging Mathematical Symbolic and Numeric Realms with Unified Pre-training | Oct 3, 2023 | Contrastive LearningEquation Discovery | CodeCode Available | 1 |
| MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts | Oct 3, 2023 | ChatbotImage Captioning | CodeCode Available | 2 |
| A Dynamic LLM-Powered Agent Network for Task-Oriented Agent Collaboration | Oct 3, 2023 | Arithmetic ReasoningCode Generation | CodeCode Available | 1 |
| Fill in the Blank: Exploring and Enhancing LLM Capabilities for Backward Reasoning in Math Word Problems | Oct 3, 2023 | GSM8KMath | CodeCode Available | 0 |
| FELM: Benchmarking Factuality Evaluation of Large Language Models | Oct 1, 2023 | BenchmarkingMath | CodeCode Available | 1 |
| Investigating the Efficacy of Large Language Models in Reflective Assessment Methods through Chain of Thoughts Prompting | Sep 30, 2023 | Math | —Unverified | 0 |
| ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving | Sep 29, 2023 | Arithmetic ReasoningComputational Efficiency | CodeCode Available | 3 |
| L2CEval: Evaluating Language-to-Code Generation Capabilities of Large Language Models | Sep 29, 2023 | Code GenerationMath | —Unverified | 0 |
| Qwen Technical Report | Sep 28, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 6 |
| NLPBench: Evaluating Large Language Models on Solving NLP Problems | Sep 27, 2023 | BenchmarkingMath | CodeCode Available | 1 |
| ReConcile: Round-Table Conference Improves Reasoning via Consensus among Diverse LLMs | Sep 22, 2023 | Math | CodeCode Available | 2 |
| MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models | Sep 21, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 2 |
| Fairness Hub Technical Briefs: AUC Gap | Sep 20, 2023 | FairnessMath | —Unverified | 0 |
| Design of Chain-of-Thought in Math Problem Solving | Sep 20, 2023 | DiversityGSM8K | CodeCode Available | 1 |
| Natural Language Embedded Programs for Hybrid Language Symbolic Reasoning | Sep 19, 2023 | Instruction FollowingLanguage Modeling | CodeCode Available | 1 |
| Contrastive Decoding Improves Reasoning in Large Language Models | Sep 17, 2023 | GSM8KHellaSwag | —Unverified | 0 |
| Odd period cycles and ergodic properties in price dynamics for an exchange economy | Sep 17, 2023 | Math | —Unverified | 0 |
| ChatGPT-4 with Code Interpreter can be used to solve introductory college-level vector calculus and electromagnetism problems | Sep 16, 2023 | Electrical EngineeringMath | —Unverified | 0 |