This page collects material about the *method of auxiliary coordinates (MAC)*, a mathematical strategy to optimise "nested" systems, such as deep neural nets, without using chain-rule gradients (backpropagation), that reuses single-layer algorithms, handles non-differentiable layers and introduces significant parallelism.

We have used related ideas to MAC in our "learning-compression" algorithm for neural net compression.

This work has been done in collaboration with my past and present students and collaborators Mehdi Alizadeh, Zhengdong Lu, Ramin Raziperchikolaei, Max Vladymyrov and Weiran Wang.

It has been funded in part by:

- NSF award IIS #1423515 (2014-2017): "Algorithms for accelerating optimization in deep learning".
- Google Faculty Research Award (2013-2014): "Parallelization of deep neural net training using the method of auxiliary coordinates".
- NSF CAREER award IIS #0546857 and #0754089 (2006-2011): "Machine learning approaches for articulatory inversion".

Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Invited keynote, CVPR 2016 Third Workshop

*DeepVision: Deep learning in computer vision*: [PDF] [video]Electrical Engineering & Computer Science, UC Merced (Jan. 31, 2014): [PDF]

Machine Learning Department, CMU (Oct. 28, 2013): [PDF] [video]

Carreira-Perpiñán, M. Á. (2019): "A tutorial on nested system optimisation using the method of auxiliary coordinates". arXiv:XXXX.XXXXX.

[external link] [paper preprint]Carreira-Perpiñán, M. Á. (2019): "A gallery of proximal operators arising in the method of auxiliary coordinates". arXiv:XXXX.XXXXX.

[external link] [paper preprint]

Carreira-Perpiñán, M. Á. and Wang, W. (2012): "Distributed optimization of deeply nested systems". Unpublished manuscript, Dec. 24, 2012, arXiv:1212.5921 [cs.LG].

[external link] [paper preprint]

This introduces a new, general mathematical technique, the*method of auxiliary coordinates (MAC)*, to train deep nets and other "nested" systems. It results in a*coordination-minimisation (CM) algorithm*that alternates training individual layers (reusing existing algorithms) with glueing layers (using auxiliary coordinates). It is provably convergent, simple to implement, has embarrassing parallelism within each step, and can apply even if derivatives are not available. It can also do model selection "on the fly" and search over architectures while it trains.Carreira-Perpiñán, M. Á. and Wang, W. (2014): "Distributed optimization of deeply nested systems".

*17th Int. Conf. Artificial Intelligence and Statistics (AISTATS 2014)*, pp. 10-19.

**Notable paper award**.

[external link] [paper preprint] [supplementary material] [slides] [video] [poster] [Matlab implementation (coming soon)]

This is a shorter version of the 2012 arXiv paper.

Carreira-Perpiñán, M. Á. and Alizadeh, M. (2019): "ParMAC: distributed optimisation of nested functions, with application to learning binary autoencoders".

*Conference on Systems and Machine Learning (SysML 2019)*, to appear.

[external link] [paper preprint] [slides] [poster] [C/C++/MPI implementation]

Longer version: May 30, 2016, arXiv:1605.09114 [cs.LG] [external link] [paper preprint]

This proposes a parallel, distributed optimisation model for the method of auxiliary coordinates (MAC) and implements it in MPI to learn binary autoencoders. We give theoretical conditions for the parallel speedup to be nearly perfect and verify them experimentally in a computer cluster.Raziperchikolaei, R. and Carreira-Perpiñán, M. Á. (2016): "Optimizing affinity-based binary hashing using auxiliary coordinates".

*Advances in Neural Information Processing Systems 29 (NIPS 2016)*, pp. 640-648.

[external link] [paper preprint] [supplementary material] [poster] [Matlab implementation (coming soon)]

Longer version: Feb. 5, 2016, arXiv:1501.05352 [cs.LG] [external link] [paper preprint]

This uses the method of auxiliary coordinates (MAC) to learn an arbitrary binary hash function (i.e., a function whose output is a vector of binary values) with an arbitrary affinity-based loss function.Carreira-Perpiñán, M. Á. and Max Vladymyrov (2015): "A fast, universal algorithm to learn parametric nonlinear embeddings".

*Advances in Neural Information Processing Systems 28 (NIPS 2015)*, pp. 253-261.

[external link] [paper preprint] [supplementary material] [poster] [Matlab implementation (coming soon)]

This uses the method of auxiliary coordinates (MAC) to learn an optimal mapping (such as linear or a neural net) for a nonlinear embedding (such as the elastic embedding or*t*-SNE).Carreira-Perpiñán, M. Á. and Raziperchikolaei, R. (2015): "Hashing with binary autoencoders".

*IEEE Conf. Computer Vision and Pattern Recognition (CVPR 2015)*, pp. 557-566.

[external link] [paper preprint] [slides] [poster] [Matlab implementation] [Weka implementation]

Longer version: Jan. 5, 2015, arXiv:1501.00756 [cs.LG] [external link] [paper preprint] [slides]

This uses the method of auxiliary coordinates (MAC) to learn an autoencoder with binary hidden units, whose encoder gives good binary hash functions for image retrieval.Wang, W. and Carreira-Perpiñán, M. Á. (2014): "The role of dimensionality reduction in classification".

*Twenty-Eighth AAAI Conference on Artificial Intelligence (AAAI 2014)*, pp. 2128-2134.

[external link] [paper preprint] [slides] [poster] [animations] [Matlab implementation] [Weka implementation]

Longer version: Wang, W. and Carreira-Perpiñán, M. Á. (2014): May 25, 2014, arXiv:1405.6444 [cs.LG] [external link] [paper preprint]

This uses the method of auxiliary coordinates (MAC) to learn nonlinear, low-dimensional features for a linear SVM.Wang, W. and Carreira-Perpiñán, M. Á. (2012): "Nonlinear low-dimensional regression using auxiliary coordinates".

*15th Int. Conf. Artificial Intelligence and Statistics (AISTATS 2012)*, pp. 1295-1304.

[external link] [paper preprint] [poster] [animations]Carreira-Perpiñán, M. Á. and Lu, Z. (2011): "Manifold learning and missing data recovery through unsupervised regression".

*12th IEEE Int. Conf. Data Mining (ICDM 2011)*, pp. 1014-1019.

[external link] [paper preprint] [slides] [animations] [Matlab implementation (coming soon)] [© IEEE]Carreira-Perpiñán, M. Á. and Lu, Z. (2010): "Parametric dimensionality reduction by unsupervised regression".

*IEEE Conf. Computer Vision and Pattern Recognition (CVPR 2010)*, pp. 1895-1902.

[external link] [paper preprint] [poster] [animations] [Matlab implementation] [© IEEE]Carreira-Perpiñán, M. Á. and Lu, Z. (2008): "Dimensionality reduction by unsupervised regression".

*IEEE Conf. Computer Vision and Pattern Recognition (CVPR 2008)*.

[external link] [paper preprint] [slides] [poster] [animations] [Matlab implementation] [© IEEE]

The preprint fixes three errata in the legends of fig. 2.

Miguel A. Carreira-Perpinan Last modified: Sun Mar 3 20:48:11 PST 2019