|In this work, we propose a method based on attention to quantize convolutional neural networks (CNNs) to run on low-precision (binary and multi-bit). Intuitively, high-quality pictures are very conducive to distinguishing objects. However, even in low-quality blackand-white photos (analogous to low-precision), various features can also be well distinguished and the content is easily understood. Based on this intuition, we introduce an attention-based block called squeeze-andthreshold (ST) to adjust different features to different ranges and learn the best threshold to distinguish (quantize) them. Furthermore, to eliminate the extra calculations caused by the ST block in the inference process, we propose a momentum-based method to learn the inference threshold during the training stage. Additionally, with the help of ST block, our quantization approach is faster and takes less than half the training epochs of prior multi-bit quantization works. The experimental results on different datasets and networks show the versatility of our method and demonstrate state-of-the-art performance.|
*** Title, author list and abstract as seen in the Camera-Ready version of the paper that was provided to Conference Committee. Small changes that may have occurred during processing by Springer may not appear in this window.