This page collects papers describing the "learning-compression" (LC) algorithm, a nonconvex optimisation algorithm for finding an optimally compressed deep neural net, i.e., which minimises a loss (such as classification error) while being compressed as much as possible. The LC algorithm alternates two steps till convergence: a learning (L) step, which learns the neural net, and a compression (C) step, which compresses its parameters. It can handle many types of compression (quantization, binarization, pruning, low-rank decomposition, etc.) by simply calling the relevant compression routine in the C step.
This work has been done in collaboration with my students Magzhan Gabidolla, Kuat Gazizov, Yerlan Idelbayev and Arman Zharmagambetov.
| It has been funded in part by: 
 | 
Second Summer School on Optimization, Big Data and Applications (OBA) (Jun. 30 - Jul. 6, 2019): [PDF]
Amazon (Jun. 19, 2018): [PDF]
Allen Institute for Artificial Intelligence (AI2) (Jun. 12, 2018): [PDF]
Dept. of Statistics, University of Washington (May 11, 2018): [PDF]
Microsoft Research, Redmond (Apr. 5, 2018): [PDF]
Available under the BSD 3-clause license at https://github.com/UCMerced-ML/LC-model-compression. This implements nearly all the algorithms described in the papers below.
Carreira-Perpiñán, M. Á. (2017): Model compression as constrained optimization, with application to neural nets. Part I: general framework. Unpublished manuscript, Jul. 5, 2017, arXiv:1707.01209.
          [external link] [paper preprint]
Carreira-Perpiñán, M. Á. and Idelbayev, Y. (2017): Model compression as constrained optimization, with application to neural nets. Part II: quantization. Unpublished manuscript, Jul. 13, 2017, arXiv:1707.04319.
          [external link] [paper preprint]
Carreira-Perpiñán, M. Á. and Zharmagambetov, A. (2018): Fast model compression. Bay Area Machine Learning Symposium (BayLearn 2018).
	  [external link] [paper preprint] [poster]
Carreira-Perpiñán, M. Á. and Idelbayev, Y. (2018): "Learning-Compression" algorithms for neural net pruning. IEEE Conf. Computer Vision and Pattern Recognition (CVPR 2018), pp. 8532--8541.
	  [external link] [paper preprint] [poster] [supplementary material] [Python implementation (old version)] [© IEEE]
Idelbayev, Y. and Carreira-Perpiñán, M. Á. (2020): Low-rank compression of neural nets: learning the rank of each layer. IEEE Conf. Computer Vision and Pattern Recognition (CVPR 2020), pp. 8046-8056.
	  [external link] [paper preprint] [poster] [supplementary material] [© IEEE]
Idelbayev, Y. and Carreira-Perpiñán, M. Á. (2020): A flexible, extensible software framework for model compression based on the LC algorithm. Unpublished manuscript, May 15, 2020, arXiv:2005.07786.
          [external link] [paper preprint] [Python implementation]
          Extended abstract at the Bay Area Machine Learning Symposium (BayLearn 2020): [paper preprint] [video]
          Short version at the 2nd On-Device Intelligence Workshop (MLSys 2021): [external link] [paper preprint] [slides] [video]
Idelbayev, Y. and Carreira-Perpiñán, M. Á. (2021): Neural network compression via additive combination of reshaped, low-rank matrices. Data Compression Conference (DCC 2021), pp. 243-252.
	  [external link] [paper preprint] [slides] [Python implementation] [© IEEE]
Idelbayev, Y. and Carreira-Perpiñán, M. Á. (2021): Optimal selection of matrix shape and decomposition scheme for neural network compression. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP 2021), pp. 3250-3254.
	  [external link] [paper preprint] [slides] [Python implementation] [© IEEE]
Idelbayev, Y. and Carreira-Perpiñán, M. Á. (2021): An empirical comparison of quantization, pruning and low-rank neural network compression using the LC toolkit. Int. Joint Conf. on Neural Networks (IJCNN 2021).
	  [external link] [paper preprint] [slides] [Python implementation] [© IEEE]
Idelbayev, Y., Molchanov, P., Shen, M., Yin, H., Carreira-Perpiñán, M. Á. and Alvarez, J. M. (2021): Optimal quantization using scaled codebook. IEEE Conf. Computer Vision and Pattern Recognition (CVPR 2021), pp. 12090-12099.
	  [external link] [paper preprint] [slides] [poster] [© IEEE]
Idelbayev, Y. and Carreira-Perpiñán, M. Á. (2021): More general and effective model compression via an additive combination of compressions. 32nd European. Conf. Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD 2021), pp. 233-248.
	  [external link] [paper preprint] [slides] [Python implementation]
          Longer version: Carreira-Perpiñán, M. Á. and Idelbayev, Y. (2021): Model compression as constrained optimization, with application to neural nets. Part V: combining compressions. Jul. 9, 2021, arXiv:2107.04380: [external link] [paper preprint]
Idelbayev, Y. and Carreira-Perpiñán, M. Á. (2021): Beyond FLOPs in low-rank compression of neural networks: optimizing device-specific inference runtime. IEEE Int. Conf. Image Processing (ICIP 2021), pp. 2843-2847.
	  [external link] [paper preprint] [slides] [poster] [Python implementation] [© IEEE]
Idelbayev, Y. and Carreira-Perpiñán, M. Á. (2021): LC: A flexible, extensible open-source toolkit for model compression. Conference on Information and Knowledge Management (CIKM 2021), resource paper, pp. 4504-4514.
	  [external link] [paper preprint] [slides] [Python implementation]
Idelbayev, Y. and Carreira-Perpiñán, M. Á. (2022): Exploring the effect of l0/l2 regularization in neural network pruning using the LC toolkit. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP 2022), pp. 3373-3377.
	  [external link] [paper preprint] [slides] [poster] [Python implementation] [© IEEE]
Gazizov, K., Idelbayev, Y. and Carreira-Perpiñán, M. Á. (2025): Emerging aspects in ResNet quantization with adaptive codebook sizes. Bay Area Machine Learning Symposium (BayLearn 2025).
	  [external link] [paper preprint] [slides] [poster] [video]