Dimensionality reduction of EPG data

Overview
Pictures of EPG prototypes
Graphs for method comparison
Two-dimensional visualisation of an utterance
Software
Links

Note: this page contains a number of colour images. To view it best, set the width of your browser window to at least 820 pixels, its height to at least 700, and as many colours as possible. The following horizontal rule is 820 pixels long.

Also, coloured fonts are used to interpret the graphs, which may clash with some colour customisation you may have.

Overview

Electropalatograms (EPGs) are typically 62-component binary vectors indicating the presence or absence of linguopalatal contact in particular moments of an utterance. Abundant data is available from the ACCOR II database, produced with the EPG system of the University of Reading. Traditional EPGs are plane views of the articulatory process; enhanced EPG systems providing 3D data are currently under development.

Dimensionality reduction of EPG data can be useful in several ways:

For phoneme classification.
A generative model of the EPG data can provide insight into the problem of the acoustic-to-articulatory mapping, in which values for articulatory parameters such as vocal tract area functions, lip positions and jaw dynamics are determined from the acoustic signal.
To determine the intrinsic dimensionality of the EPG data.

Some ad hoc reduction strategies have been proposed for EPGs (Hardcastle et al. 1989, 1991), but little work using adaptive techniques has been done. We have used latent variable models and finite mixtures to fit maximum likelihood models to a subset of the ACCOR database. We show that these unsupervised learning methods can extract important structure from the EPG data and perform well in varying speech conditions (e.g. different speakers or different speech styles). In particular, nonlinear methods present a clear advantage over linear ones. You can find more about this research in the following papers:

Carreira-Perpiñán, M. Á. and Renals, S. (1998): "Dimensionality reduction of electropalatographic data using latent variable models". Speech Communication 26(4):259-282.
Carreira-Perpiñán, M. Á. and Renals, S. (1998): "Experimental evaluation of latent variable models for dimensionality reduction". Proc. of the 1998 IEEE Signal Processing Society Workshop on Neural Networks for Signal Processing (NNSP98), pp. 165-173, Cambridge, UK.

Pictures of EPG prototypes

The subset of the ACCOR database that we used included EPG frames sampled from several different utterances by 6 English native speakers (FG, HD, KM, PD, RK, SN). This is how some typical EPG frames look like (below is the corresponding phoneme):

[Image (gif 6K): Typical EPG prototypes]

Each 62-dimensional vector is represented in the customary way as a two-dimensional 8x8 image (where the top corners are unused): components 1-6 in the first row (alveoli), components 7-14 in the second,..., components 55-62 in the eighth row (velum). Each vector component is scaled to [-1,1] and plotted as follows:

In colour images, in red if it is negative, in blue if it is positive and with an intensity proportional to its magnitude (this applies to the figures in the web page).
In black and white images, as a white square if it is negative, as a black square if it is positive and with an area proportional to its magnitude (this applies to the figures in the papers).

We partitioned the data set into 6 subsets, each of them corresponding to a different speaker (FG, HD, KM, PD, RK, SN). Each subset was itself split into a training (75% of the frames) and a test set (25%). For each speaker and using the training set, we found maximum likelihood estimates for the following models:

Factor analysis (FA)
Principal component analysis (PCA)
Two-dimensional generative topographic mapping (GTM)
Mixture of first-order factor analysers (MFA)
Mixture of multivariate Bernoulli distributions (MB)

The following picture shows, for speaker RK, the factors or prototypes extracted by FA, PCA, MFA and MB as follows:

First row: factor loadings found in a 9th-order factor analysis (varimax-rotated).
Second row: first 9 principal components (without rotation), or eigenEPGs.
Third row: first 9 principal components (varimax-rotated).
Fourth row: mean vectors and factor loadings for a 4-component mixture of first-order factor analysers. For each component, the picture on the left is the factor loadings vector (without rotation) and the one on the right the mean vector.
Fifth row: prototypes for a 9-component mixture of multivariate Bernoulli distributions (without rotation).

[Image (60K): Prototypes for speaker RK]

Similar pictures are available for the rest of the speakers: FG, HD, KM, PD, RK, SN.

Graphs for method comparison

The comparative performance of factor analysis (FA), principal component analysis (PCA), the two-dimensional generative topographic mapping (GTM), the mixture of first-order factor analysers (MFA) and the mixture of multivariate Bernoulli distributions (MB) is shown in the following graphs for speaker RK in terms of log-likelihood and squared reconstruction error in the training set and the test set:

[Image (19K): Log-likelihood and squared reconstruction error for speaker RK]

Note that the X axis refers to the order of the factor analysis or principal components analysis, the number of mixture components in the case of mixture models and the square root of the number of basis functions in the case of the two-dimensional GTM.

Similar pictures are available for the rest of the speakers: FG, HD, KM, PD, RK, SN.

Two-dimensional visualisation of an utterance

The following figure shows the projection onto a two-dimensional latent space of all the EPG frames from the highlighted fragment of the utterance "I prefer Kant to Hobbes for a good bedtime book", linking consecutive points by a line, for speaker RK. The phonemic transcription of the utterance is:

The left graph uses the latent space of factors 1 and 2, while the right one uses GTM (points are numbered correlatively). The start and end points are marked as * and o, respectively. The phonemes are those of the aforementioned figure.

[Image (gif 9K): 2D plot for speaker RK]

Similar pictures are available for the rest of the speakers: FG, HD, KM, PD, RK, SN.

Software

I have put together some Matlab programs to find maximum likelihood estimates of some models and perform various other operations:

You can find Matlab software for other models elsewhere:

GTM (by Markus Svensén)
Mixtures of Factor Analysers (by Zoubin Ghahramani)

EPG links

This is how an acryllic pseudopalate for use in EPG looks like: from the University of Reading; from UCLA Phonetics Lab
The Articulatory Database Registry at the the Centre for Speech Technology Research (CSTR) of the University of Edinburgh
ACCOR II ESPRIT project and EUR-ACCOR detailed database description
Electropalatography pages at the Department of Speech and Language Sciences of the Queen Margaret College, Edinburgh. These include:
- An online bibliography about EPG, prepared by Fiona Gibbon and Bill Hardcastle
- WinEPG, a Windows 95/NT program for EPG
- The second European Symposium on Electrolaryngography (ELG) and Electropalatography (EPG) held at the Queen Margaret College, Edinburgh in June 1997
The Reading EPG
The Articulograph AG100
Electropalatography pages at UCLA Phonetics Lab
The Vocal Tract Visualization Lab at the University of Maryland has produced a 3D tongue model
The International Clinical Phonetics and Linguistics Association (ICPLA)
Clinical Linguistics Bibliography by Jörg Mayer
Noel Nguyen has produced EMA Tools, a Matlab package for the analysis of acoustic/articulatory data

Miguel A. Carreira-Perpinan

Last modified: Mon Oct 3 01:23:07 PDT 2005

UC Merced | EECS | MACP's Home Page