Original title: Improving Source-Free Target Adaptation with Vision Transformers Leveraging Domain Representation Images
Authors: Gauransh Sawhney, Daksh Dave, Adeel Ahmed, Jiechao Gao, Khalid Saleem
This article delves into improving Vision Transformers (ViTs) for adapting to new environments without prior labeled data. While Convolutional Neural Networks (CNNs) are often used for this, ViTs offer fresh possibilities. The researchers dive into the ViT mechanism, tweaking its elements and discovering that changing one part doesn’t affect its performance much. Leveraging this, they introduce Domain Representation Images (DRIs) as domain-specific markers, enhancing ViT training. To test this, they compare ViTs with and without DRIs, measuring their effectiveness in adapting to new environments. The results? ViTs with DRIs outshine existing benchmarks, showing superior adaptability. This study highlights the significance of DRIs in boosting ViT efficiency for adapting to new, unlabeled environments—a crucial step forward in making these models more adaptable and versatile across domains.
Original article: https://arxiv.org/abs/2311.12589