Optimal Affine Activation Steering Methods for Unlearning
Ross Baker
Increasing social and legal standards surrounding data privacy have motivated recent attention towards machine unlearning techniques. In this paper we explain a variety of affine activation steering methods for targeted knowledge removal, as well as propose 3 methods of our own. We then compare their effects on transformers in a toy model designed to test both recall and inference of underlying token patterns.