| OpenChat: Advancing Open-source Language Models with Mixed-Quality Data | Sep 20, 2023 | Arithmetic ReasoningCode Generation | —Unverified | 0 |
| Query-Dependent Prompt Evaluation and Optimization with Offline Inverse RL | Sep 13, 2023 | Arithmetic ReasoningNavigate | CodeCode Available | 1 |
| WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct | Aug 18, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 5 |
| Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification | Aug 15, 2023 | Arithmetic ReasoningMath | CodeCode Available | 2 |
| Token-Scaled Logit Distillation for Ternary Weight Generative Language Models | Aug 13, 2023 | Arithmetic ReasoningCommon Sense Reasoning | CodeCode Available | 1 |
| Scaling Relationship on Learning Mathematical Reasoning with Large Language Models | Aug 3, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 2 |
| Llama 2: Open Foundation and Fine-Tuned Chat Models | Jul 18, 2023 | Arithmetic Reasoning | CodeCode Available | 8 |
| Model Card and Evaluations for Claude Models | Jul 11, 2023 | Arithmetic ReasoningBug fixing | —Unverified | 0 |
| On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes | Jun 23, 2023 | Arithmetic ReasoningKnowledge Distillation | —Unverified | 0 |
| Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs | Jun 22, 2023 | Arithmetic ReasoningBenchmarking | CodeCode Available | 1 |