| Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback | Jun 13, 2024 | Instruction FollowingMath | CodeCode Available | 7 |
| StarCoder 2 and The Stack v2: The Next Generation | Feb 29, 2024 | Code CompletionCode Generation | CodeCode Available | 7 |
| DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines | Oct 5, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 7 |
| Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models | May 6, 2023 | Math | CodeCode Available | 7 |
| Mistral 7B | Oct 10, 2023 | answerability predictionArithmetic Reasoning | CodeCode Available | 6 |
| Qwen Technical Report | Sep 28, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 6 |
| AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration | Jun 1, 2023 | Autonomous DrivingCloud Computing | CodeCode Available | 6 |
| GPT-4 Technical Report | Mar 15, 2023 | answerability predictionArithmetic Reasoning | CodeCode Available | 6 |
| Chain-of-Thought Prompting Elicits Reasoning in Large Language Models | Jan 28, 2022 | Common Sense ReasoningGSM8K | CodeCode Available | 6 |
| Reinforcement Learning from Human Feedback | Apr 16, 2025 | MathPhilosophy | CodeCode Available | 5 |