| Truth Forest: Toward Multi-Scale Truthfulness in Large Language Models through Intervention without Tuning | Dec 29, 2023 | TruthfulQA | CodeCode Available | 1 |
| Alleviating Hallucinations of Large Language Models through Induced Hallucinations | Dec 25, 2023 | HallucinationHallucination Evaluation | CodeCode Available | 1 |
| Tool-Augmented Reward Modeling | Oct 2, 2023 | TruthfulQA | CodeCode Available | 1 |
| Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics | Sep 13, 2023 | EthicsTruthfulQA | CodeCode Available | 1 |
| RAIN: Your Language Models Can Align Themselves without Finetuning | Sep 13, 2023 | Adversarial AttackTruthfulQA | CodeCode Available | 1 |
| Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment | Aug 18, 2023 | MMLURed Teaming | CodeCode Available | 1 |
| TruthfulQA: Measuring How Models Mimic Human Falsehoods | Sep 8, 2021 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Unsupervised Elicitation of Language Models | Jun 11, 2025 | GSM8KTruthfulQA | CodeCode Available | 0 |
| Model Unlearning via Sparse Autoencoder Subspace Guided Projections | May 30, 2025 | Adversarial Robustnessfeature selection | —Unverified | 0 |
| Shadows in the Attention: Contextual Perturbation and Representation Drift in the Dynamics of Hallucination in LLMs | May 22, 2025 | HallucinationTruthfulQA | —Unverified | 0 |