| Interpretability, Then What? Editing Machine Learning Models to Reflect Human Knowledge and Values | Jun 30, 2022 | Additive modelsBIG-bench Machine Learning | CodeCode Available | 5 | 5 |
| A Comprehensive Study of Knowledge Editing for Large Language Models | Jan 2, 2024 | knowledge editingModel Editing | CodeCode Available | 5 | 5 |
| pyvene: A Library for Understanding and Improving PyTorch Models via Interventions | Mar 12, 2024 | Model Editing | CodeCode Available | 5 | 5 |
| Locating and Editing Factual Associations in GPT | Feb 10, 2022 | counterfactualModel Editing | CodeCode Available | 3 | 5 |
| Neuron-Level Sequential Editing for Large Language Models | Oct 5, 2024 | Model Editing | CodeCode Available | 3 | 5 |
| AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models | Oct 3, 2024 | knowledge editingModel Editing | CodeCode Available | 3 | 5 |
| Sparse Autoencoders Find Highly Interpretable Features in Language Models | Sep 15, 2023 | counterfactualLanguage Modelling | CodeCode Available | 3 | 5 |
| MEMORYLLM: Towards Self-Updatable Large Language Models | Feb 7, 2024 | Model Editing | CodeCode Available | 3 | 5 |
| UltraEdit: Training-, Subject-, and Memory-Free Lifelong Editing in Large Language Models | May 20, 2025 | GPULifelong learning | CodeCode Available | 2 | 5 |
| Interpreting Arithmetic Mechanism in Large Language Models through Comparative Neuron Analysis | Sep 21, 2024 | Model EditingPrediction | CodeCode Available | 2 | 5 |
| Model Editing as a Robust and Denoised variant of DPO: A Case Study on Toxicity | May 22, 2024 | Language ModellingModel Editing | CodeCode Available | 2 | 5 |
| Decomposing and Editing Predictions by Modeling Model Computation | Apr 17, 2024 | counterfactualmodel | CodeCode Available | 2 | 5 |
| Interpreting CLIP with Sparse Linear Concept Embeddings (SpLiCE) | Feb 16, 2024 | Model Editing | CodeCode Available | 2 | 5 |
| BadEdit: Backdooring large language models by model editing | Mar 20, 2024 | Backdoor Attackknowledge editing | CodeCode Available | 1 | 5 |
| Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models | Jan 10, 2023 | Denoisingknowledge editing | CodeCode Available | 1 | 5 |
| Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptors | Nov 20, 2022 | Model EditingWorld Knowledge | CodeCode Available | 1 | 5 |
| Fast Model Editing at Scale | Oct 21, 2021 | GPULanguage Modelling | CodeCode Available | 1 | 5 |
| History Matters: Temporal Knowledge Editing in Large Language Model | Dec 9, 2023 | knowledge editingLanguage Modeling | CodeCode Available | 1 | 5 |
| Injecting Universal Jailbreak Backdoors into LLMs in Minutes | Feb 9, 2025 | Model Editing | CodeCode Available | 1 | 5 |
| Detecting Edit Failures In Large Language Models: An Improved Specificity Benchmark | May 27, 2023 | Model EditingSpecificity | CodeCode Available | 1 | 5 |
| Attribution Analysis Meets Model Editing: Advancing Knowledge Correction in Vision Language Models with VisEdit | Aug 19, 2024 | DecoderLanguage Modeling | CodeCode Available | 1 | 5 |
| Editing Large Language Models: Problems, Methods, and Opportunities | May 22, 2023 | Model Editing | CodeCode Available | 1 | 5 |
| A Unified Framework for Model Editing | Mar 21, 2024 | Memorizationmodel | CodeCode Available | 1 | 5 |
| Reinforced Lifelong Editing for Language Models | Feb 9, 2025 | Model Editing | CodeCode Available | 1 | 5 |
| Can Sensitive Information Be Deleted From LLMs? Objectives for Defending Against Extraction Attacks | Sep 29, 2023 | Model Editing | CodeCode Available | 1 | 5 |