Neural net compression using the "learning-compression" (LC) algorithm

This page collects papers describing the "learning-compression" (LC) algorithm, a nonconvex optimisation algorithm for finding an optimally compressed deep neural net, i.e., which minimises a loss (such as classification error) while being compressed as much as possible. The LC algorithm alternates two steps till convergence: a learning (L) step, which learns the neural net, and a compression (C) step, which compresses its parameters. It can handle many types of compression (quantization, binarization, pruning, low-rank decomposition, etc.) by simply calling the relevant compression routine in the C step.

This work has been done in collaboration with my students Magzhan Gabidolla, Kuat Gazizov, Yerlan Idelbayev and Arman Zharmagambetov.

It has been funded in part by:

NSF award IIS #1423515 (2014-2017): Algorithms for accelerating optimization in deep learning.
Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
GPU donations by the NVIDIA Corporation.

Selected presentations

Second Summer School on Optimization, Big Data and Applications (OBA) (Jun. 30 - Jul. 6, 2019): [PDF]
Amazon (Jun. 19, 2018): [PDF]
Allen Institute for Artificial Intelligence (AI2) (Jun. 12, 2018): [PDF]
Dept. of Statistics, University of Washington (May 11, 2018): [PDF]
Microsoft Research, Redmond (Apr. 5, 2018): [PDF]

Software

Available under the BSD 3-clause license at https://github.com/UCMerced-ML/LC-model-compression. This implements nearly all the algorithms described in the papers below.

References

Carreira-Perpiñán, M. Á. (2017): Model compression as constrained optimization, with application to neural nets. Part I: general framework. Unpublished manuscript, Jul. 5, 2017, arXiv:1707.01209.
[external link] [paper preprint]
Carreira-Perpiñán, M. Á. and Idelbayev, Y. (2017): Model compression as constrained optimization, with application to neural nets. Part II: quantization. Unpublished manuscript, Jul. 13, 2017, arXiv:1707.04319.
[external link] [paper preprint]
Carreira-Perpiñán, M. Á. and Zharmagambetov, A. (2018): Fast model compression. Bay Area Machine Learning Symposium (BayLearn 2018).
[external link] [paper preprint] [poster]
Carreira-Perpiñán, M. Á. and Idelbayev, Y. (2018): "Learning-Compression" algorithms for neural net pruning. IEEE Conf. Computer Vision and Pattern Recognition (CVPR 2018), pp. 8532--8541.
[external link] [paper preprint] [poster] [supplementary material] [Python implementation (old version)] [© IEEE]
Idelbayev, Y. and Carreira-Perpiñán, M. Á. (2020): Low-rank compression of neural nets: learning the rank of each layer. IEEE Conf. Computer Vision and Pattern Recognition (CVPR 2020), pp. 8046-8056.
[external link] [paper preprint] [poster] [supplementary material] [© IEEE]
Idelbayev, Y. and Carreira-Perpiñán, M. Á. (2020): A flexible, extensible software framework for model compression based on the LC algorithm. Unpublished manuscript, May 15, 2020, arXiv:2005.07786.
[external link] [paper preprint] [Python implementation]
Extended abstract at the Bay Area Machine Learning Symposium (BayLearn 2020): [paper preprint] [video]
Short version at the 2nd On-Device Intelligence Workshop (MLSys 2021): [external link] [paper preprint] [slides] [video]
Idelbayev, Y. and Carreira-Perpiñán, M. Á. (2021): Neural network compression via additive combination of reshaped, low-rank matrices. Data Compression Conference (DCC 2021), pp. 243-252.
[external link] [paper preprint] [slides] [Python implementation] [© IEEE]
Idelbayev, Y. and Carreira-Perpiñán, M. Á. (2021): Optimal selection of matrix shape and decomposition scheme for neural network compression. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP 2021), pp. 3250-3254.
[external link] [paper preprint] [slides] [Python implementation] [© IEEE]
Idelbayev, Y. and Carreira-Perpiñán, M. Á. (2021): An empirical comparison of quantization, pruning and low-rank neural network compression using the LC toolkit. Int. Joint Conf. on Neural Networks (IJCNN 2021).
[external link] [paper preprint] [slides] [Python implementation] [© IEEE]
Idelbayev, Y., Molchanov, P., Shen, M., Yin, H., Carreira-Perpiñán, M. Á. and Alvarez, J. M. (2021): Optimal quantization using scaled codebook. IEEE Conf. Computer Vision and Pattern Recognition (CVPR 2021), pp. 12090-12099.
[external link] [paper preprint] [slides] [poster] [© IEEE]
Idelbayev, Y. and Carreira-Perpiñán, M. Á. (2021): More general and effective model compression via an additive combination of compressions. 32nd European. Conf. Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD 2021), pp. 233-248.
[external link] [paper preprint] [slides] [Python implementation]
Longer version: Carreira-Perpiñán, M. Á. and Idelbayev, Y. (2021): Model compression as constrained optimization, with application to neural nets. Part V: combining compressions. Jul. 9, 2021, arXiv:2107.04380: [external link] [paper preprint]
Idelbayev, Y. and Carreira-Perpiñán, M. Á. (2021): Beyond FLOPs in low-rank compression of neural networks: optimizing device-specific inference runtime. IEEE Int. Conf. Image Processing (ICIP 2021), pp. 2843-2847.
[external link] [paper preprint] [slides] [poster] [Python implementation] [© IEEE]
Idelbayev, Y. and Carreira-Perpiñán, M. Á. (2021): LC: A flexible, extensible open-source toolkit for model compression. Conference on Information and Knowledge Management (CIKM 2021), resource paper, pp. 4504-4514.
[external link] [paper preprint] [slides] [Python implementation]
Idelbayev, Y. and Carreira-Perpiñán, M. Á. (2022): Exploring the effect of l₀/l₂ regularization in neural network pruning using the LC toolkit. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP 2022), pp. 3373-3377.
[external link] [paper preprint] [slides] [poster] [Python implementation] [© IEEE]
Gazizov, K., Idelbayev, Y. and Carreira-Perpiñán, M. Á. (2025): Emerging aspects in ResNet quantization with adaptive codebook sizes. Bay Area Machine Learning Symposium (BayLearn 2025).
[external link] [paper preprint] [slides] [poster] [video]

Miguel A. Carreira-Perpinan

Last modified: Thu Oct 9 19:57:37 PDT 2025

UC Merced | EECS | MACP's Home Page