This page collects papers describing the "learning-compression" (LC) algorithm, a nonconvex optimisation algorithm for finding an optimally compressed deep neural net, i.e., which minimises a loss (such as classification error) while being compressed as much as possible. The LC algorithm alternates two steps till convergence: a learning (L) step, which learns the neural net, and a compression (C) step, which compresses its parameters. It can handle many types of compression (quantization, binarization, pruning, low-rank decomposition, etc.) by simply calling the relevant compression routine in the C step.
This work has been done in collaboration with my students Magzhan Gabidolla, Kuat Gazizov, Yerlan Idelbayev and Arman Zharmagambetov.
It has been funded in part by:
|
Second Summer School on Optimization, Big Data and Applications (OBA) (Jun. 30 - Jul. 6, 2019): [PDF]
Amazon (Jun. 19, 2018): [PDF]
Allen Institute for Artificial Intelligence (AI2) (Jun. 12, 2018): [PDF]
Dept. of Statistics, University of Washington (May 11, 2018): [PDF]
Microsoft Research, Redmond (Apr. 5, 2018): [PDF]
Available under the BSD 3-clause license at https://github.com/UCMerced-ML/LC-model-compression. This implements nearly all the algorithms described in the papers below.
Carreira-Perpiñán, M. Á. (2017): "Model compression as constrained optimization, with application to neural nets. Part I: general framework". Unpublished manuscript, Jul. 5, 2017, arXiv:1707.01209.
[external link] [paper preprint]
Carreira-Perpiñán, M. Á. and Idelbayev, Y. (2017): "Model compression as constrained optimization, with application to neural nets. Part II: quantization". Unpublished manuscript, Jul. 13, 2017, arXiv:1707.04319.
[external link] [paper preprint]
Carreira-Perpiñán, M. Á. and Zharmagambetov, A. (2018): "Fast model compression". Bay Area Machine Learning Symposium (BayLearn 2018).
[external link] [paper preprint] [poster]
Carreira-Perpiñán, M. Á. and Idelbayev, Y. (2018): ""Learning-Compression" algorithms for neural net pruning". IEEE Conf. Computer Vision and Pattern Recognition (CVPR 2018), pp. 8532--8541.
[external link] [paper preprint] [poster] [supplementary material] [Python implementation (old version)] [© IEEE]
Idelbayev, Y. and Carreira-Perpiñán, M. Á. (2020): "Low-rank compression of neural nets: learning the rank of each layer". IEEE Conf. Computer Vision and Pattern Recognition (CVPR 2020), pp. 8046-8056.
[external link] [paper preprint] [poster] [supplementary material] [© IEEE]
Idelbayev, Y. and Carreira-Perpiñán, M. Á. (2020): "A flexible, extensible software framework for model compression based on the LC algorithm". Unpublished manuscript, May 15, 2020, arXiv:2005.07786.
[external link] [paper preprint] [Python implementation]
Extended abstract at the Bay Area Machine Learning Symposium (BayLearn 2020): [paper preprint] [video]
Short version at the 2nd On-Device Intelligence Workshop (MLSys 2021): [external link] [paper preprint] [slides] [video]
Idelbayev, Y. and Carreira-Perpiñán, M. Á. (2021): "Neural network compression via additive combination of reshaped, low-rank matrices". Data Compression Conference (DCC 2021), pp. 243-252.
[external link] [paper preprint] [slides] [Python implementation] [© IEEE]
Idelbayev, Y. and Carreira-Perpiñán, M. Á. (2021): "Optimal selection of matrix shape and decomposition scheme for neural network compression". IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP 2021), pp. 3250-3254.
[external link] [paper preprint] [slides] [Python implementation] [© IEEE]
Idelbayev, Y. and Carreira-Perpiñán, M. Á. (2021): "An empirical comparison of quantization, pruning and low-rank neural network compression using the LC toolkit". Int. Joint Conf. on Neural Networks (IJCNN 2021).
[external link] [paper preprint] [slides] [Python implementation] [© IEEE]
Idelbayev, Y., Molchanov, P., Shen, M., Yin, H., Carreira-Perpiñán, M. Á. and Alvarez, J. M. (2021): "Optimal quantization using scaled codebook". IEEE Conf. Computer Vision and Pattern Recognition (CVPR 2021), pp. 12090-12099.
[external link] [paper preprint] [slides] [poster] [© IEEE]
Idelbayev, Y. and Carreira-Perpiñán, M. Á. (2021): "More general and effective model compression via an additive combination of compressions". 32nd European. Conf. Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD 2021), pp. 233-248.
[external link] [paper preprint] [slides] [Python implementation]
Longer version: Carreira-Perpiñán, M. Á. and Idelbayev, Y. (2021): "Model compression as constrained optimization, with application to neural nets. Part V: combining compressions". Jul. 9, 2021, arXiv:2107.04380: [external link] [paper preprint]
Idelbayev, Y. and Carreira-Perpiñán, M. Á. (2021): "Beyond FLOPs in low-rank compression of neural networks: optimizing device-specific inference runtime". IEEE Int. Conf. Image Processing (ICIP 2021), pp. 2843-2847.
[external link] [paper preprint] [slides] [poster] [Python implementation] [© IEEE]
Idelbayev, Y. and Carreira-Perpiñán, M. Á. (2021): "LC: A flexible, extensible open-source toolkit for model compression". Conference on Information and Knowledge Management (CIKM 2021), resource paper, pp. 4504-4514.
[external link] [paper preprint] [slides] [Python implementation]
Idelbayev, Y. and Carreira-Perpiñán, M. Á. (2022): "Exploring the effect of l0/l2 regularization in neural network pruning using the LC toolkit". IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP 2022), pp. 3373-3377.
[external link] [paper preprint] [slides] [poster] [Python implementation] [© IEEE]