Zero-shot CLIP Class Forgetting via Text-image Space Adaptation

Research output: Contribution to journalArticlepeer-review

17 Downloads (Pure)

Abstract

Efficient class forgetting has attracted significant interest due to the high computational cost of retraining models from scratch whenever classes need to be forgotten. This need arises from data privacy regulations, the necessity to remove outdated information, and the possibility to enhance model robustness and security.
In this paper we address class forgetting in vision-language CLIP model. Modern class forgetting methods for CLIP have demonstrated that zero-shot forgetting is achievable by generating synthetic data and fine-tuning both visual and textual encoders with a regularization loss. Our approach shows that class forgetting in CLIP can be accomplished in a zero-shot manner without any visual data by adapting the shared vision-text space of CLIP, thereby making the class forgetting process more efficient. Our method delivers superior results, demonstrating strong performance and complete class removal, regardless of the visual encoder used in CLIP. Furthermore, we explore what exactly is being targeted by the class forgetting algorithm discovering some interesting properties of CLIP features.
Original languageEnglish
Number of pages24
JournalTransactions on Machine Learning Research
Publication statusPublished - 20 Jan 2025

Funding

We’d like to gratefully acknowledge Microsoft’s compute support through Microsoft’s Accelerating Foundation Models Research grant and the support from University of Bath for the studentship.

Keywords

  • Class forgetting
  • Vision-language models

Fingerprint

Dive into the research topics of 'Zero-shot CLIP Class Forgetting via Text-image Space Adaptation'. Together they form a unique fingerprint.

Cite this