Abstract: This research proposes and evaluates a novel approach to optimizing matrix multiplication (MatMul) on Huawei Ascend NPUs, motivated by a key insight: during matrix-vector multiplication ...
Abstract: For a variety of ML applications, generalized matrix multiply (GEMM) with DOT product is the most computationally intensive operation. This paper presents a microarchitecture exploration of ...