Original title: Linear Log-Normal Attention with Unbiased Concentration
Authors: Yury Nahshan, Joseph Kampeas, Emir Haleva
The article explores limitations in Transformer models due to the quadratic complexity of self-attention, which hampers scalability for long sequences or high-res images. To tackle this, the study dives into analyzing the attention matrix distribution and its concentration ability. It introduces a new self-attention method called Linear Log-Normal Attention, mimicking the original self-attention’s distribution and concentration behavior. Testing this approach on language benchmarks shows its superiority over other linearized attention alternatives. The proposed method presents a promising direction for enhancing Transformer model scalability. They’ve made their code available for reference, opening avenues for further exploration and potential improvements in handling extensive sequences or complex image data.
Original article: https://arxiv.org/abs/2311.13541