CSE282 Advanced Topics in Machine Learning (Fall semester 2008)

Instructor

Miguel Á. Carreira-Perpiñán Assistant professor Electrical Engineering and Computer Science School of Engineering University of California, Merced mcarreira-perpinan-[at]-ucmerced.edu; 209-2284545 Office: 284, Science & Engineering Building

Office hours: by appointment (call or email, including [CSE282] in the subject).

Lectures: Tuesdays/Thursdays 1:30-2:45pm (Classroom Building 209)

Lab class: Fridays 1:30-4:15pm (Linux Lab, SE138)

Course web page: http://faculty.ucmerced.edu/mcarreira-perpinan/CSE282

Course description

The course reviews advanced topics in machine learning. Machine learning is the study of models and algorithms that learn information from data. Machine learning ideas underlie many algorithms in computer vision, speech processing, bioinformatics, robotics, computer graphics and other areas. The 2008 edition of the course will focus on dimensionality reduction and manifold learning.

Prerequisites: the course is intended for graduate students who have taken an introductory course in machine learning (such as CSE276).

Textbook

There is no required textbook. Selected readings will appear in this web page in due course. The following are two reviews of dimensionality reduction and manifold learning techniques:

M. Á. Carreira-Perpiñán (2001): Continuous latent variable models for dimensionality reduction and sequential data reconstruction. PhD thesis, University of Sheffield, UK.
- Chapter 2: The continuous latent variable modelling formalism.
  This contains a review of continuous latent variable models: probabilistic principal component analysis (PCA), factor analysis, the generative topographic mapping (GTM), independent component analysis (ICA), mixtures of latent variable models, etc. It also deals with issues such as parameter estimation, identifiability, interpretability, visualisation, and dimensionality reduction with continuous latent variable models.
- Chapter 4: Dimensionality reduction.
  This contains a review of dimensionality reduction with nonprobabilistic methods (probabilistic methods, i.e., latent variable models, are reviewed in chapter 2): nonlinear autoassociators, kernel PCA, principal curves, vector quantisation, multidimensional scaling, Isomap, LLE, etc. It also reviews issues such as the curse of dimensionality and the intrinsic dimensionality.
L. K. Saul, K. Q. Weinberger, J. H. Ham, F. Sha and D. D. Lee (2006): "Spectral methods for dimensionality reduction", In Semi-Supervised Learning (O. Chapelle, B. Schölkopf and A. Zien, eds.), MIT Press, pp. 293-308.

Other books on general machine learning:

Christopher M. Bishop: Pattern Recognition and Machine Learning. Springer, 2006.
David J. C. MacKay: Information Theory, Inference and Learning Algorithms. Cambridge University Press, 2003.
Bernhard Schölkopf and Alexander J. Smola: Learning with Kernels. MIT Press, 2001.
Trevor J. Hastie, Robert J. Tibshirani and Jerome H. Friedman: The Elements of Statistical Learning. Springer, 2001.
Richard O. Duda, Peter E. Hart and David G. Stork: Pattern Classification, second ed. Wiley, 2001.
Aapo Hyvärinen, Juha Karhunen and Erkki Oja: Independent Component Analysis. Wiley, 2001.

Readings

Autoencoders and related methods (Oct. 28):
- Bourlard and Kamp: "Autoassociation by the multilayer perceptrons and singular value decomposition". Biological Cybernetics, 1988.
- Baldi and Hornik: "Neural networks and principal component analysis: learning from examples without local minima". Neural Networks, 1989.
- Saund: "Dimensionality-reduction using connectionist networks". IEEE Trans. PAMI, 1989.
- Kramer: "Nonlinear principal component analysis using autoassociative neural networks". AIChE Journal, 1991.
- DeMers and Cottrell: "Non-linear dimensionality reduction". NIPS, 1992.
- Kung et al: "Adaptive Principal Component EXtraction (APEX) and applications". IEEE Trans. SP, 1994.
- Hecht-Nielsen: "Replicator neural networks for universal optimal source coding". Science, 1995.
- Malthouse: "Limitations of nonlinear PCA as performed with generic neural networks". IEEE Trans. NN, 1998.
- Hadsell et al: "Dimensionality reduction by learning an invariant mapping". CVPR, 2006.
- Hinton and Salakhutdinov: "Reducing the dimensionality of data with neural networks". Science, 2006. See also Perspective and code.
Mixture models, local PCA (Nov. 4):
- Ghahramani and Hinton: "The EM algorithm for mixtures of factor analyzers". U. of Toronto tech. rep. CRG-TR-96-1, 1996. Code.
- Hinton et al: "Modeling the manifolds of images of handwritten digits". IEEE Trans. NN, 1997.
- Kambhatla and Leen: "Dimension reduction by local principal component analysis". Neural Computation, 1997.
- Tipping and Bishop: "Probabilistic principal component analysis". J. Royal Stat. Soc. B, 1999.
- Tipping and Bishop: "Mixtures of probabilistic principal component analyzers". Neural Computation, 1999. NETLAB code.
- Roweis et al: "Global coordination of local linear models". NIPS, 2001.
- Brand: "Charting a manifold". NIPS, 2002.
- Teh and Roweis: "Automatic alignment of local representations". NIPS, 2002.
- Vidal et al: "Generalized principal component analysis (GPCA)". IEEE Trans. PAMI, 2005. Code.
- Verbeek: "Learning nonlinear image manifolds by global alignment of local linear models". IEEE Trans. PAMI, 2006.
Unsupervised regression (Nov. 11):
- Tan and Mavrovouniotis: "Reducing data dimensionality through optimizing neural network inputs". AIChE Journal, 1995.
- Smola et al: "Regularized principal manifolds". JMLR, 2001.
- Meinicke et al: "Principal surfaces from unsupervised kernel regression". PAMI, 2005. Code.
- Lawrence: "Gaussian Process Latent Variable Models for visualisation of high dimensional data". JMLR, 2005. Code.
- Memisevic: "Kernel information embeddings". ICML, 2006. Python code.
- Ranzato et al: "Efficient learning of sparse representations with an energy-based model". NIPS, 2006.
- Rahimi et al: "Learning to transform time series with a few examples". IEEE Trans. PAMI, 2007.
- Carreira-Perpiñán and Lu: "Dimensionality Reduction by Unsupervised Regression". CVPR, 2008.
More methods (Nov. 17):
- Friedman and Tukey: "A projection pursuit algorithm for exploratory data analysis". IEEE Trans. Computers, 1974.
- Durbin et al: "An analysis of the elastic net approach to the traveling salesman problem". Neural Computation, 1989.
- Hastie and Stuetzle: "Principal curves". JASA, 1989.
- Bell and Sejnowski: "An information-maximization approach to blind separation and blind deconvolution". Neural Computation, 1995.
- Bishop et al: "GTM: the Generative Topographic Mapping". Neural Computation, 1998. Code.
- Schölkopf et al: "Nonlinear component analysis as a kernel eigenvalue problem". Neural Computation, 1998. Code.
- Kégl et al: "Learning and design of principal curves". IEEE Trans. PAMI, 2000. Code
- Hinton and Roweis: "Stochastic Neighbor Embedding". NIPS, 2002. Code.
- Goldberger et al: "Neigbourhood Component Analysis". NIPS, 2003. Code.
Spectral methods and out-of-sample extensions (Nov. 24):
- Tenenbaum et al: "A global geometric framework for nonlinear dimensionality reduction". Science, 2000. See also Perspective and code.
- Roweis and Saul: "Nonlinear dimensionality reduction by Locally Linear Embedding", Science, 2000; and Saul and Roweis: "Think globally, fit locally: unsupervised learning of low dimensional manifolds", JMLR, 2003. Code.
- Belkin and Niyogi: "Laplacian Eigenmaps for dimensionality reduction and data representation". Neural Computation, 2003. Code.
- Donoho and Grimes: "Hessian Eigenmaps: locally linear embedding techniques for high-dimensional data". PNAS, 2003.
- He and Niyogi: "Locality Preserving Projections". NIPS, 2003.
- Zhang and Zha: "Principal manifolds and nonlinear dimension reduction via Local Tangent Space Alignment". SIAM J. Sci. Comput., 2004.
- Bengio et al: "Learning eigenfunctions links spectral embedding and kernel PCA". Neural Computation, 2004.
- Coifman et al: "Geometric diffusions as a tool for harmonic analysis and structure definition of data: diffusion maps" and "multiscale methods". PNAS, 2005.
- Weinberger and Saul: "Unsupervised learning of image manifolds by semidefinite programming". IJCV, 2006.
- Carreira-Perpiñán and Lu: "The Laplacian Eigenmaps Latent Variable Model". AISTATS, 2007.
Applications (Dec. 1):
- Harshman et al: "Factor analysis of tongue shapes". J. Acoust. Soc. Amer., 1977.
- Belhumeur et al: "Eigenfaces vs. Fisherfaces: recognition using class specific linear projection". IEEE Trans. PAMI, 1997.
- Carreira-Perpiñán and Renals: "Dimensionality reduction of electropalatographic data using latent variable models". Speech Communication, 1998.
- Kohonen et al: "Self organization of a massive document collection". IEEE Trans. Neural Networks, 2000. Demo.
- Karni and Gotsman: "Spectral compression of mesh geometry". SIGGRAPH, 2000.
- Keogh et al: "Dimensionality reduction for fast similarity search in large time series databases". Knowledge and Information Systems, 2001.
- Carreira-Perpiñán and Goodhill: "Influence of lateral connections on the structure of cortical maps". J. Neurophysiology, 2004.
- Grochow et al: "Style-based inverse kinematics". SIGGRAPH, 2004.
- Ji and Zha: "Sensor positioning in wireless ad-hoc sensor networks with multidimensional scaling". INFOCOM, 2004.
- Costa et al: "Distributed weighted-multidimensional scaling for node localization in sensor networks". ACM Trans. Sensor Networks, 2006.
- Lu et al: "People tracking with the Laplacian Eigenmaps Latent Variable Model". NIPS, 2007.

Dimensionality reduction and manifold learning links

Matrix identities (handy formulas for matrix derivatives, inverses, etc.):
- Matrix calculus from Mike Brookes' Matrix Reference Manual
- Matrix cookbook

Matlab tutorials

If you have never used Matlab, there are many online tutorials, for example: