Abstract: As the core building block of vision transformers, attention is a powerful tool to capture long-range dependency. However, such power comes at a cost: it incurs a huge computation burden and ...
Abstract: Due to the large number of parameters and high computational complexity, Vision Transformer (ViT) is not suitable for deployment on mobile devices. As a result, the design of efficient ...