| Is Bigger and Deeper Always Better? Probing LLaMA Across Scales and Layers | Dec 7, 2023 | MathMultiple-choice | CodeCode Available | 1 |
| Eliciting Latent Knowledge from Quirky Language Models | Dec 2, 2023 | Anomaly DetectionMath | CodeCode Available | 1 |
| MathGloss: Building mathematical glossaries from text | Nov 21, 2023 | Math | CodeCode Available | 1 |
| FinanceMath: Knowledge-Intensive Math Reasoning in Finance Domains | Nov 16, 2023 | MathMath Word Problem Solving | CodeCode Available | 1 |
| DocMath-Eval: Evaluating Math Reasoning Capabilities of LLMs in Understanding Long and Specialized Documents | Nov 16, 2023 | Math | CodeCode Available | 1 |
| StrategyLLM: Large Language Models as Strategy Generators, Executors, Optimizers, and Evaluators for Problem Solving | Nov 15, 2023 | Math | CodeCode Available | 1 |
| Towards Reasoning in Large Language Models via Multi-Agent Peer Review Collaboration | Nov 14, 2023 | DiversityMath | CodeCode Available | 1 |
| Conic10K: A Challenging Math Problem Understanding and Reasoning Dataset | Nov 9, 2023 | MathNatural Language Understanding | CodeCode Available | 1 |
| Bias Runs Deep: Implicit Reasoning Biases in Persona-Assigned LLMs | Nov 8, 2023 | FairnessMath | CodeCode Available | 1 |
| Implicit Chain of Thought Reasoning via Knowledge Distillation | Nov 2, 2023 | Knowledge DistillationMath | CodeCode Available | 1 |