| RLHF Workflow: From Reward Modeling to Online RLHF | May 13, 2024 | ChatbotHumanEval | CodeCode Available | 5 | 5 |
| DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models | Sep 7, 2023 | TruthfulQA | CodeCode Available | 2 | 5 |
| In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation | Mar 3, 2024 | HallucinationTruthfulQA | CodeCode Available | 2 | 5 |
| Inference-Time Intervention: Eliciting Truthful Answers from a Language Model | Jun 6, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| Tuning Language Models by Proxy | Jan 16, 2024 | Domain AdaptationMath | CodeCode Available | 2 | 5 |
| TruthX: Alleviating Hallucinations by Editing Large Language Models in Truthful Space | Feb 27, 2024 | Contrastive LearningHallucination | CodeCode Available | 2 | 5 |
| Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics | Sep 13, 2023 | EthicsTruthfulQA | CodeCode Available | 1 | 5 |
| Non-Linear Inference Time Intervention: Improving LLM Truthfulness | Mar 27, 2024 | Large Language ModelMultiple-choice | CodeCode Available | 1 | 5 |
| Integrative Decoding: Improve Factuality via Implicit Self-consistency | Oct 2, 2024 | TruthfulQA | CodeCode Available | 1 | 5 |
| Instruction Tuning With Loss Over Instructions | May 23, 2024 | HumanEvalMMLU | CodeCode Available | 1 | 5 |
| TruthfulQA: Measuring How Models Mimic Human Falsehoods | Sep 8, 2021 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment | Aug 18, 2023 | MMLURed Teaming | CodeCode Available | 1 | 5 |
| Alleviating Hallucinations of Large Language Models through Induced Hallucinations | Dec 25, 2023 | HallucinationHallucination Evaluation | CodeCode Available | 1 | 5 |
| Tool-Augmented Reward Modeling | Oct 2, 2023 | TruthfulQA | CodeCode Available | 1 | 5 |
| Truth Forest: Toward Multi-Scale Truthfulness in Large Language Models through Intervention without Tuning | Dec 29, 2023 | TruthfulQA | CodeCode Available | 1 | 5 |
| Machine Unlearning in Large Language Models | May 24, 2024 | Machine UnlearningTruthfulQA | CodeCode Available | 1 | 5 |
| RAIN: Your Language Models Can Align Themselves without Finetuning | Sep 13, 2023 | Adversarial AttackTruthfulQA | CodeCode Available | 1 | 5 |
| DeLTa: A Decoding Strategy based on Logit Trajectory Prediction Improves Factuality and Reasoning Ability | Mar 4, 2025 | GSM8KLogical Reasoning | CodeCode Available | 0 | 5 |
| Enhancing Language Model Factuality via Activation-Based Confidence Calibration and Guided Decoding | Jun 19, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 | 5 |
| SaGE: Evaluating Moral Consistency in Large Language Models | Feb 21, 2024 | Decision MakingHellaSwag | CodeCode Available | 0 | 5 |
| Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback | May 24, 2023 | TriviaQATruthfulQA | CodeCode Available | 0 | 5 |
| A test suite of prompt injection attacks for LLM-based machine translation | Oct 7, 2024 | Machine TranslationTranslation | CodeCode Available | 0 | 5 |
| Multi-Agent Reinforcement Learning with Focal Diversity Optimization | Feb 6, 2025 | DiversityMulti-agent Reinforcement Learning | CodeCode Available | 0 | 5 |
| Instruction Tuning with Human Curriculum | Oct 14, 2023 | ARCMMLU | CodeCode Available | 0 | 5 |
| CHAIR -- Classifier of Hallucination as Improver | Jan 5, 2025 | HallucinationMMLU | CodeCode Available | 0 | 5 |