| Model Editing as a Robust and Denoised variant of DPO: A Case Study on Toxicity | May 22, 2024 | Language ModellingModel Editing | CodeCode Available | 2 |
| Decomposing and Editing Predictions by Modeling Model Computation | Apr 17, 2024 | counterfactualmodel | CodeCode Available | 2 |
| Interpreting CLIP with Sparse Linear Concept Embeddings (SpLiCE) | Feb 16, 2024 | Model Editing | CodeCode Available | 2 |
| BiasEdit: Debiasing Stereotyped Language Models via Model Editing | Mar 11, 2025 | counterfactualLanguage Modeling | CodeCode Available | 1 |
| SPEED: Scalable, Precise, and Efficient Concept Erasure for Diffusion Models | Mar 10, 2025 | Model Editing | CodeCode Available | 1 |
| The Mirage of Model Editing: Revisiting Evaluation in the Wild | Feb 16, 2025 | Model EditingQuestion Answering | CodeCode Available | 1 |
| Reinforced Lifelong Editing for Language Models | Feb 9, 2025 | Model Editing | CodeCode Available | 1 |
| Injecting Universal Jailbreak Backdoors into LLMs in Minutes | Feb 9, 2025 | Model Editing | CodeCode Available | 1 |
| Attribution Analysis Meets Model Editing: Advancing Knowledge Correction in Vision Language Models with VisEdit | Aug 19, 2024 | DecoderLanguage Modeling | CodeCode Available | 1 |
| Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs | Jul 22, 2024 | Model EditingRed Teaming | CodeCode Available | 1 |