| RAIN: Your Language Models Can Align Themselves without Finetuning | Sep 13, 2023 | Adversarial AttackTruthfulQA | CodeCode Available | 1 |
| Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics | Sep 13, 2023 | EthicsTruthfulQA | CodeCode Available | 1 |
| DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models | Sep 7, 2023 | TruthfulQA | CodeCode Available | 2 |
| Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment | Aug 18, 2023 | MMLURed Teaming | CodeCode Available | 1 |
| Semantic Consistency for Assuring Reliability of Large Language Models | Aug 17, 2023 | Question AnsweringText Generation | —Unverified | 0 |
| Inference-Time Intervention: Eliciting Truthful Answers from a Language Model | Jun 6, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback | May 24, 2023 | TriviaQATruthfulQA | —Unverified | 0 |
| Measuring Reliability of Large Language Models through Semantic Consistency | Nov 10, 2022 | Text GenerationTruthfulQA | CodeCode Available | 0 |
| Teaching language models to support answers with verified quotes | Mar 21, 2022 | Fact CheckingNatural Questions | —Unverified | 0 |
| TruthfulQA: Measuring How Models Mimic Human Falsehoods | Sep 8, 2021 | Language ModelingLanguage Modelling | CodeCode Available | 1 |