Preconditioned Test-Time Adaptation for Out-of-Distribution Debiasing in Narrative Generation
Hanwen Shen, Ting Ying, Jiajie Lu, Shanshan Wang
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
Although debiased LLMs perform well on known bias patterns, they often fail to generalize to unfamiliar bias prompts, producing toxic outputs. We first validate that such high-bias prompts constitute a distribution shift via OOD detection, and show static models degrade under this shift. To adapt on-the-fly, we propose CAP-TTA, a test-time adaptation framework that performs context-aware LoRA updates only when the bias-risk trigger exceeds a threshold, using a precomputed diagonal preconditioner for fast and stable updates. Across toxic-prompt settings and benchmarks, CAP-TTA reduces bias (confirmed by human evaluation) while achieving much lower update latency than AdamW/SGD; it also mitigates catastrophic forgetting by significantly improving narrative fluency over SOTA debiasing baseline while maintaining comparable debiasing effectiveness.