Abstract: Quantization is a neural network compression technique that effectively improves the deployment performance on inference hardware. Fixed-point quantization methods use the same bit-width for ...