Original title: Bitformer: An efficient Transformer with bitwise operation-based attention for Big Data Analytics at low-cost low-precision devices
Authors: Gaoxiang Duan, Junkai Zhang, Xiaoying Zheng, Yongxin Zhu
In the realm of large models, the Transformer has reshaped modern approaches but faces challenges due to its intensive attention mechanism and reliance on high-precision operations. Edge computing environments, constrained by resources and favoring lower precision, demand innovative solutions. Enter Bitformer: an inventive extension of the Transformer. Its key innovation lies in a new attention method, swapping floating-point operations with bitwise ones. This shift retains the attention’s ability to capture complex long-range dependencies while significantly reducing computational complexity—from $O(n^2d)$ to $O(n^2T)$, where $T$ is much smaller than the conventional dimensionality parameter $d$. Bitformer aims to marry high-performing models with resource-limited settings, paving a path for advancements in this domain. This model seeks to harmonize the demands of modern computing with edge computing’s limitations, offering a promising avenue for future progress.
Original article: https://arxiv.org/abs/2311.13502