Original title: Continual Diffusion with STAMINA: STack-And-Mask INcremental Adapters
Authors: James Seale Smith, Yen-Chang Hsu, Zsolt Kira, Yilin Shen, Hongxia Jin
In this article, the researchers explore the idea of customizing text-to-image diffusion models for multiple concepts in a sequential manner. They refer to this setting as continual diffusion. The challenge they face is scaling these methods to longer concept sequences without forgetting previously learned information. While prior work has attempted to mitigate forgetting, they find that the capacity to learn new tasks reaches a limit over longer sequences.
To address this challenge, the researchers propose a new method called STack-And-Mask INcremental Adapters (STAMINA). STAMINA consists of low-ranked attention-masked adapters and customized MLP tokens. These components enhance the fine-tuning abilities of the model for sequential concept learning by using learnable hard-attention masks parameterized with low rank MLPs. This enables precise and scalable learning through sparse adaptation.
The researchers demonstrate that STAMINA outperforms the current state-of-the-art method for customizing text-to-image models on a benchmark consisting of landmarks and human faces. They also extend their method to the field of image classification and show that it achieves state-of-the-art performance in this standard benchmark as well. Importantly, all the introduced parameters can be folded back into the model after training, resulting in no additional inference parameter costs.
Original article: https://arxiv.org/abs/2311.18763