This page collects material about the *method of auxiliary coordinates (MAC)*, a mathematical strategy to optimise "nested" systems, such as deep neural nets, without using chain-rule gradients (backpropagation), that reuses single-layer algorithms, handles non-differentiable layers and introduces significant parallelism.

We have used related ideas to MAC in our "learning-compression" algorithm for neural net compression.

This work has been done in collaboration with my past and present students and collaborators Mehdi Alizadeh, Magzhan Gabidolla, Zhengdong Lu, Ramin Raziperchikolaei, Max Vladymyrov, Weiran Wang and Arman Zharmagambetov.

It has been funded in part by: - NSF award IIS #1423515 (2014-2017): "Algorithms for accelerating optimization in deep learning".
- NSF award IIS #2007147 (2020-2023): "The TAO algorithm: principled, efficient optimization of decision trees, forests, tree-based neural nets, and beyond".
- Google Faculty Research Award (2013-2014): "Parallelization of deep neural net training using the method of auxiliary coordinates".
- NSF CAREER award IIS #0546857 and #0754089 (2006-2011): "Machine learning approaches for articulatory inversion".
Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. |

Second Summer School on Optimization, Big Data and Applications (OBA) (Jun. 30 - Jul. 6, 2019): [PDF]

Invited keynote, CVPR 2016 Third Workshop

*DeepVision: Deep learning in computer vision*: [PDF] [video]Electrical Engineering & Computer Science, UC Merced (Jan. 31, 2014): [PDF]

Machine Learning Department, CMU (Oct. 28, 2013): [PDF] [video]

Carreira-Perpiñán, M. Á. and Wang, W. (2012): "Distributed optimization of deeply nested systems". Unpublished manuscript, Dec. 24, 2012, arXiv:1212.5921 [cs.LG].

[external link] [paper preprint]

This introduces a new, general mathematical technique, the*method of auxiliary coordinates (MAC)*, to train deep nets and other "nested" systems. It results in a*coordination-minimisation (CM) algorithm*that alternates training individual layers (reusing existing algorithms) with glueing layers (using auxiliary coordinates). It is provably convergent, simple to implement, has embarrassing parallelism within each step, and can apply even if derivatives are not available. It can also do model selection "on the fly" and search over architectures while it trains.Carreira-Perpiñán, M. Á. and Wang, W. (2014): "Distributed optimization of deeply nested systems".

*17th Int. Conf. Artificial Intelligence and Statistics (AISTATS 2014)*, pp. 10-19.

**Notable paper award**.

[external link] [paper preprint] [supplementary material] [slides] [video] [poster] [Matlab implementation (coming soon)]

This is a shorter version of the 2012 arXiv paper.

Zharmagambetov, A. and Carreira-Perpiñán, M. Á. (2022): "Semi-supervised learning with decision trees: graph Laplacian tree alternating optimization".

*Advances in Neural Information Processing Systems 35 (NeurIPS 2022)*, to appear.

[external link] [paper preprint] [supplementary material] [poster] [video]

Extended abstract at the*Bay Area Machine Learning Symposium (BayLearn 2022)*: [paper preprint] [poster]

Gabidolla, M. and Carreira-Perpiñán, M. Á. (2022): "Optimal interpretable clustering using oblique decision trees".

*ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD 2022)*, pp. 400-410.

[external link] [paper preprint] [slides] [poster]Zharmagambetov, A. and Carreira-Perpiñán, M. Á. (2022): "Interpretable dimensionality reduction via tree-based parametric embeddings".

*25th Int. Conf. Artificial Intelligence and Statistics (AISTATS 2022)*, pp. 9550-9570.

[external link] [paper preprint] [slides] [poster] [animations]Carreira-Perpiñán, M. Á. and Alizadeh, M. (2019): "ParMAC: distributed optimisation of nested functions, with application to learning binary autoencoders".

*Proc. Machine Learning and Systems 1 (MLSys 2019)*, pp. 276-288.

[external link] [paper preprint] [slides] [poster] [C/C++/MPI implementation] [video]

Longer version: May 30, 2016, arXiv:1605.09114 [cs.LG] [external link] [paper preprint]

This proposes a parallel, distributed optimisation model for the method of auxiliary coordinates (MAC) and implements it in MPI to learn binary autoencoders. We give theoretical conditions for the parallel speedup to be nearly perfect and verify them experimentally in a computer cluster.Raziperchikolaei, R. and Carreira-Perpiñán, M. Á. (2016): "Optimizing affinity-based binary hashing using auxiliary coordinates".

*Advances in Neural Information Processing Systems 29 (NIPS 2016)*, pp. 640-648.

[external link] [paper preprint] [supplementary material] [poster] [Matlab implementation (coming soon)]

Longer version: Feb. 5, 2016, arXiv:1501.05352 [cs.LG] [external link] [paper preprint]

This uses the method of auxiliary coordinates (MAC) to learn an arbitrary binary hash function (i.e., a function whose output is a vector of binary values) with an arbitrary affinity-based loss function.Carreira-Perpiñán, M. Á. and Max Vladymyrov (2015): "A fast, universal algorithm to learn parametric nonlinear embeddings".

*Advances in Neural Information Processing Systems 28 (NIPS 2015)*, pp. 253-261.

[external link] [paper preprint] [supplementary material] [poster] [Matlab implementation (coming soon)]

This uses the method of auxiliary coordinates (MAC) to learn an optimal mapping (such as linear or a neural net) for a nonlinear embedding (such as the elastic embedding or*t*-SNE).Carreira-Perpiñán, M. Á. and Raziperchikolaei, R. (2015): "Hashing with binary autoencoders".

*IEEE Conf. Computer Vision and Pattern Recognition (CVPR 2015)*, pp. 557-566.

[external link] [paper preprint] [slides] [poster] [Matlab implementation] [Weka implementation]

Longer version: Jan. 5, 2015, arXiv:1501.00756 [cs.LG] [external link] [paper preprint] [slides]

This uses the method of auxiliary coordinates (MAC) to learn an autoencoder with binary hidden units, whose encoder gives good binary hash functions for image retrieval.Wang, W. and Carreira-Perpiñán, M. Á. (2014): "The role of dimensionality reduction in classification".

*28th AAAI Conference on Artificial Intelligence (AAAI 2014)*, pp. 2128-2134.

[external link] [paper preprint] [slides] [poster] [animations] [Matlab implementation] [Weka implementation]

Longer version: Wang, W. and Carreira-Perpiñán, M. Á. (2014): May 25, 2014, arXiv:1405.6444 [cs.LG] [external link] [paper preprint]

This uses the method of auxiliary coordinates (MAC) to learn nonlinear, low-dimensional features for a linear SVM. It also describes (possibly for the first time) the phenomenon of "neural collapse" popularised more recently in the neural networks literature, namely that the activations prior to a linear classifier layer collapse into point-like clusters arranged as the vertices of a simplex-like configuration. Specifically, we considered a composition*y = g(F(x))*of a nonlinear, infinitely flexible mapping*F*and a*K*-class linear classifier*g*. When jointly training*F*and*g*to optimise a classification loss, the*L*-dimensional outputs of*F*(called auxiliary coordinates in our paper) would cluster the training points of each class onto a single centroid, and these*K*centroids would form an optimal geometrical arrangement, as follows:*L*≥*K*-1: centroids at the vertices of a regular simplex (with perfect classification).*L*<*K*-1: centroids maximally separated on a hypersphere in dimension*L*(with perfect classification).*L*= 1 and*K*> 2: centroids on a suboptimal configuration (with imperfect classification).

This presentation (slides 51-61) considers logistic regression besides a linear SVM as classifier

*g*.Wang, W. and Carreira-Perpiñán, M. Á. (2012): "Nonlinear low-dimensional regression using auxiliary coordinates".

*15th Int. Conf. Artificial Intelligence and Statistics (AISTATS 2012)*, pp. 1295-1304.

[external link] [paper preprint] [poster] [animations]Carreira-Perpiñán, M. Á. and Lu, Z. (2011): "Manifold learning and missing data recovery through unsupervised regression".

*12th IEEE Int. Conf. Data Mining (ICDM 2011)*, pp. 1014-1019.

[external link] [paper preprint] [slides] [animations] [Matlab implementation (coming soon)] [© IEEE]Carreira-Perpiñán, M. Á. and Lu, Z. (2010): "Parametric dimensionality reduction by unsupervised regression".

*IEEE Conf. Computer Vision and Pattern Recognition (CVPR 2010)*, pp. 1895-1902.

[external link] [paper preprint] [video] [poster] [animations] [Matlab implementation] [© IEEE]Carreira-Perpiñán, M. Á. and Lu, Z. (2008): "Dimensionality reduction by unsupervised regression".

*IEEE Conf. Computer Vision and Pattern Recognition (CVPR 2008)*.

[external link] [paper preprint] [slides] [poster] [animations] [Matlab implementation] [© IEEE]

The preprint fixes three errata in the legends of fig. 2.

Carreira-Perpiñán, M. Á. (2019): "A tutorial on nested system optimisation using the method of auxiliary coordinates". arXiv:XXXX.XXXXX.

[external link] [paper preprint]Carreira-Perpiñán, M. Á. (2019): "A gallery of proximal operators arising in the method of auxiliary coordinates". arXiv:XXXX.XXXXX.

[external link] [paper preprint]

Miguel A. Carreira-Perpinan Last modified: Mon Jan 16 17:26:51 PST 2023