| RLHF Workflow: From Reward Modeling to Online RLHF | May 13, 2024 | ChatbotHumanEval | CodeCode Available | 5 |
| Inference-Time Intervention: Eliciting Truthful Answers from a Language Model | Jun 6, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation | Mar 3, 2024 | HallucinationTruthfulQA | CodeCode Available | 2 |
| TruthX: Alleviating Hallucinations by Editing Large Language Models in Truthful Space | Feb 27, 2024 | Contrastive LearningHallucination | CodeCode Available | 2 |
| Tuning Language Models by Proxy | Jan 16, 2024 | Domain AdaptationMath | CodeCode Available | 2 |
| DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models | Sep 7, 2023 | TruthfulQA | CodeCode Available | 2 |
| Non-Linear Inference Time Intervention: Improving LLM Truthfulness | Mar 27, 2024 | Large Language ModelMultiple-choice | CodeCode Available | 1 |
| Machine Unlearning in Large Language Models | May 24, 2024 | Machine UnlearningTruthfulQA | CodeCode Available | 1 |
| RAIN: Your Language Models Can Align Themselves without Finetuning | Sep 13, 2023 | Adversarial AttackTruthfulQA | CodeCode Available | 1 |
| Instruction Tuning With Loss Over Instructions | May 23, 2024 | HumanEvalMMLU | CodeCode Available | 1 |