Can Vision Transformers Enhance Source-Free Target Adaptation?

Original title: Improving Source-Free Target Adaptation with Vision Transformers Leveraging Domain Representation Images

Authors: Gauransh Sawhney, Daksh Dave, Adeel Ahmed, Jiechao Gao, Khalid Saleem

This article delves into improving Vision Transformers (ViTs) for adapting to new environments without prior labeled data. While Convolutional Neural Networks (CNNs) are often used for this, ViTs offer fresh possibilities. The researchers dive into the ViT mechanism, tweaking its elements and discovering that changing one part doesn’t affect its performance much. Leveraging this, they introduce Domain Representation Images (DRIs) as domain-specific markers, enhancing ViT training. To test this, they compare ViTs with and without DRIs, measuring their effectiveness in adapting to new environments. The results? ViTs with DRIs outshine existing benchmarks, showing superior adaptability. This study highlights the significance of DRIs in boosting ViT efficiency for adapting to new, unlabeled environments—a crucial step forward in making these models more adaptable and versatile across domains.

Original article: https://arxiv.org/abs/2311.12589