| RLHF Workflow: From Reward Modeling to Online RLHF | May 13, 2024 | ChatbotHumanEval | CodeCode Available | 5 |
| In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation | Mar 3, 2024 | HallucinationTruthfulQA | CodeCode Available | 2 |
| TruthX: Alleviating Hallucinations by Editing Large Language Models in Truthful Space | Feb 27, 2024 | Contrastive LearningHallucination | CodeCode Available | 2 |
| Tuning Language Models by Proxy | Jan 16, 2024 | Domain AdaptationMath | CodeCode Available | 2 |
| Inference-Time Intervention: Eliciting Truthful Answers from a Language Model | Jun 6, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models | Sep 7, 2023 | TruthfulQA | CodeCode Available | 2 |
| Machine Unlearning in Large Language Models | May 24, 2024 | Machine UnlearningTruthfulQA | CodeCode Available | 1 |
| Tool-Augmented Reward Modeling | Oct 2, 2023 | TruthfulQA | CodeCode Available | 1 |
| Integrative Decoding: Improve Factuality via Implicit Self-consistency | Oct 2, 2024 | TruthfulQA | CodeCode Available | 1 |
| Truth Forest: Toward Multi-Scale Truthfulness in Large Language Models through Intervention without Tuning | Dec 29, 2023 | TruthfulQA | CodeCode Available | 1 |
| Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics | Sep 13, 2023 | EthicsTruthfulQA | CodeCode Available | 1 |
| TruthfulQA: Measuring How Models Mimic Human Falsehoods | Sep 8, 2021 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Non-Linear Inference Time Intervention: Improving LLM Truthfulness | Mar 27, 2024 | Large Language ModelMultiple-choice | CodeCode Available | 1 |
| Instruction Tuning With Loss Over Instructions | May 23, 2024 | HumanEvalMMLU | CodeCode Available | 1 |
| Alleviating Hallucinations of Large Language Models through Induced Hallucinations | Dec 25, 2023 | HallucinationHallucination Evaluation | CodeCode Available | 1 |
| RAIN: Your Language Models Can Align Themselves without Finetuning | Sep 13, 2023 | Adversarial AttackTruthfulQA | CodeCode Available | 1 |
| Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment | Aug 18, 2023 | MMLURed Teaming | CodeCode Available | 1 |
| Evaluating Consistencies in LLM responses through a Semantic Clustering of Question Answering | Oct 20, 2024 | Language ModellingLarge Language Model | —Unverified | 0 |
| A Debate-Driven Experiment on LLM Hallucinations and Accuracy | Oct 25, 2024 | Fact CheckingHallucination | —Unverified | 0 |
| Elastic Weight Consolidation for Full-Parameter Continual Pre-Training of Gemma2 | May 9, 2025 | ARCBelebele | —Unverified | 0 |
| Iter-AHMCL: Alleviate Hallucination for Large Language Model via Iterative Model-level Contrastive Learning | Oct 16, 2024 | Contrastive Learninggraph construction | —Unverified | 0 |
| Cost-Saving LLM Cascades with Early Abstention | Feb 13, 2025 | GSM8KMMLU | —Unverified | 0 |
| Lower Layer Matters: Alleviating Hallucination via Multi-Layer Fusion Contrastive Decoding with Truthfulness Refocused | Aug 16, 2024 | HallucinationTruthfulQA | —Unverified | 0 |
| Efficient MAP Estimation of LLM Judgment Performance with Prior Transfer | Apr 17, 2025 | Conformal PredictionTruthfulQA | —Unverified | 0 |
| Maintaining Informative Coherence: Migrating Hallucinations in Large Language Models via Absorbing Markov Chains | Oct 27, 2024 | Text GenerationTruthfulQA | —Unverified | 0 |