| pyvene: A Library for Understanding and Improving PyTorch Models via Interventions | Mar 12, 2024 | Model Editing | CodeCode Available | 5 |
| A Comprehensive Study of Knowledge Editing for Large Language Models | Jan 2, 2024 | knowledge editingModel Editing | CodeCode Available | 5 |
| Interpretability, Then What? Editing Machine Learning Models to Reflect Human Knowledge and Values | Jun 30, 2022 | Additive modelsBIG-bench Machine Learning | CodeCode Available | 5 |
| Neuron-Level Sequential Editing for Large Language Models | Oct 5, 2024 | Model Editing | CodeCode Available | 3 |
| AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models | Oct 3, 2024 | knowledge editingModel Editing | CodeCode Available | 3 |
| MEMORYLLM: Towards Self-Updatable Large Language Models | Feb 7, 2024 | Model Editing | CodeCode Available | 3 |
| Sparse Autoencoders Find Highly Interpretable Features in Language Models | Sep 15, 2023 | counterfactualLanguage Modelling | CodeCode Available | 3 |
| Locating and Editing Factual Associations in GPT | Feb 10, 2022 | counterfactualModel Editing | CodeCode Available | 3 |
| UltraEdit: Training-, Subject-, and Memory-Free Lifelong Editing in Large Language Models | May 20, 2025 | GPULifelong learning | CodeCode Available | 2 |
| Interpreting Arithmetic Mechanism in Large Language Models through Comparative Neuron Analysis | Sep 21, 2024 | Model EditingPrediction | CodeCode Available | 2 |
| Model Editing as a Robust and Denoised variant of DPO: A Case Study on Toxicity | May 22, 2024 | Language ModellingModel Editing | CodeCode Available | 2 |
| Decomposing and Editing Predictions by Modeling Model Computation | Apr 17, 2024 | counterfactualmodel | CodeCode Available | 2 |
| Interpreting CLIP with Sparse Linear Concept Embeddings (SpLiCE) | Feb 16, 2024 | Model Editing | CodeCode Available | 2 |
| BiasEdit: Debiasing Stereotyped Language Models via Model Editing | Mar 11, 2025 | counterfactualLanguage Modeling | CodeCode Available | 1 |
| SPEED: Scalable, Precise, and Efficient Concept Erasure for Diffusion Models | Mar 10, 2025 | Model Editing | CodeCode Available | 1 |
| The Mirage of Model Editing: Revisiting Evaluation in the Wild | Feb 16, 2025 | Model EditingQuestion Answering | CodeCode Available | 1 |
| Reinforced Lifelong Editing for Language Models | Feb 9, 2025 | Model Editing | CodeCode Available | 1 |
| Injecting Universal Jailbreak Backdoors into LLMs in Minutes | Feb 9, 2025 | Model Editing | CodeCode Available | 1 |
| Attribution Analysis Meets Model Editing: Advancing Knowledge Correction in Vision Language Models with VisEdit | Aug 19, 2024 | DecoderLanguage Modeling | CodeCode Available | 1 |
| Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs | Jul 22, 2024 | Model EditingRed Teaming | CodeCode Available | 1 |
| Perturbation-Restrained Sequential Model Editing | May 27, 2024 | Continual Learningmodel | CodeCode Available | 1 |
| Large Scale Knowledge Washing | May 26, 2024 | DecoderMemorization | CodeCode Available | 1 |
| Lifelong Knowledge Editing for LLMs with Retrieval-Augmented Continuous Prompt Learning | May 6, 2024 | knowledge editingLifelong learning | CodeCode Available | 1 |
| On Mechanistic Knowledge Localization in Text-to-Image Generative Models | May 2, 2024 | Model Editing | CodeCode Available | 1 |
| Is Bigger Edit Batch Size Always Better? -- An Empirical Study on Model Editing with Llama-3 | May 1, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 |