TY - GEN
T1 - Searching for N:M Fine-grained Sparsity of Weights and Activations in Neural Networks
AU - Akiva-Hochman, Ruth
AU - Finder, Shahaf E.
AU - Turek, Javier S.
AU - Treister, Eran
N1 - Publisher Copyright:
© 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2023/1/1
Y1 - 2023/1/1
N2 - Sparsity in deep neural networks has been extensively studied to compress and accelerate models for environments with limited resources. The general approach of pruning aims at enforcing sparsity on the obtained model, with minimal accuracy loss, but with a sparsity structure that enables acceleration on hardware. The sparsity can be enforced on either the weights or activations of the network, and existing works tend to focus on either one for the entire network. In this paper, we suggest a strategy based on Neural Architecture Search (NAS) to sparsify both activations and weights throughout the network, while utilizing the recent approach of N:M fine-grained structured sparsity that enables practical acceleration on dedicated GPUs. We show that a combination of weight and activation pruning is superior to each option separately. Furthermore, during the training, the choice between pruning the weights of activations can be motivated by practical inference costs (e.g., memory bandwidth). We demonstrate the efficiency of the approach on several image classification datasets.
AB - Sparsity in deep neural networks has been extensively studied to compress and accelerate models for environments with limited resources. The general approach of pruning aims at enforcing sparsity on the obtained model, with minimal accuracy loss, but with a sparsity structure that enables acceleration on hardware. The sparsity can be enforced on either the weights or activations of the network, and existing works tend to focus on either one for the entire network. In this paper, we suggest a strategy based on Neural Architecture Search (NAS) to sparsify both activations and weights throughout the network, while utilizing the recent approach of N:M fine-grained structured sparsity that enables practical acceleration on dedicated GPUs. We show that a combination of weight and activation pruning is superior to each option separately. Furthermore, during the training, the choice between pruning the weights of activations can be motivated by practical inference costs (e.g., memory bandwidth). We demonstrate the efficiency of the approach on several image classification datasets.
KW - Activation pruning
KW - N:M fine-grained Sparsity
KW - Neural architecture search
KW - Weight pruning
UR - http://www.scopus.com/inward/record.url?scp=85150975050&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-25082-8_9
DO - 10.1007/978-3-031-25082-8_9
M3 - Conference contribution
AN - SCOPUS:85150975050
SN - 9783031250811
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 130
EP - 143
BT - Computer Vision – ECCV 2022 Workshops, Proceedings
A2 - Karlinsky, Leonid
A2 - Michaeli, Tomer
A2 - Nishino, Ko
PB - Springer Science and Business Media Deutschland GmbH
T2 - 17th European Conference on Computer Vision, ECCV 2022
Y2 - 23 October 2022 through 27 October 2022
ER -