Errata to the 1st printing (10 9 8 7 6 5 4 3 2 1 in ISBN number page) of: Ethem Alpaydin: "Introduction to Machine Learning", 3rd ed. MIT Press, 2014. Miguel A. Carreira-Perpinan, 2015. General comments: - Many figures in the book have the wrong aspect ratio, and in some of them this affects the comprehension of the material. For example, fig. 7.2 about k-means is too stretched vertically, so that the distances are distorted and some points appear to be assigned to the wrong cluster. In fig. 6.12, the Opdigits after LDA is considerably distorted (should be about twice as wide). Likewise fig. 6.16 and others. - P. 22 eq. 2.3: x^t,r^t -> (x^t,r^t) (ie, a set of ordered pairs). - P. 36 eq. 2.17: there should be some space between \bar{x} and \bar{r}, otherwise it looks like \bar{x r} (ie, the average of the products xt*rt). - P. 81 l. -8: E[g(x)] -> E_X[g(x)] - P. 128 l. 1: the covariance is X'.X/N. - P. 136: "their their". - P. 175 l. 16: "that is inversely proportional to the distance" strictly means 1/d where d is the distance. It should say "that is a decreasing function of the distance". - P. 176 l. 13: "The graph should always be connected". A disconnected graph still works, in that each connected component contains one (or more clusters). Generally, one should run a connected-components algorithm and then apply spectral clustering to each component. - P. 177 first eq.: (xrj - zsj)^p -> |xrj - zsj|^p (absolute value). - P. 177 l. -11: "constructing the minimal spanning tree of the graph" using Kruskal's algorithm. - P. 199 l. 14: "seperating". - P. 220 l. 14: the "calligraphic I" symbol (impurity?) has not been defined. - P. 235, exe. 1: the Gini index should be multiplied by 2 to be consistent with eq. (9.5). - P. 248 l. 1: should be log(y/(1-y)) (with extra parenthesis). - P. 261 eq. (10.48): the [ ]+ operator should apply to w'.(xv-xu), not to xv-xu. - P. 267: it is odd to consider MLPs as nonparametric methods. - P. 310: "lingustics". - P. 315: "Immenent". - P. 326: "topogrophical". - P. 350 point 5: we cannot solve analytically for the optimum; a QP requires an iterative algorithm. Heuristics for learning rates, etc. are less crucial than for other nonlinear models, but still important depending on the QP optimization algorithm. - P. 352 line after eq. (13.3): "this is a standard quadratic programming problem". Also p. 353 l. 10: "quadratic programming methods". - P. 355: "sectiona". - Chapter 13: it would be more clear to show explicitly what variables are optimized over, eg in eq. (13.17) to write min_{w,w0,\rho,\xi_1...\xi_N}, to differentiate it from variables such as \vu or N that are fixed and not optimized over. - P. 370, l. -1: "we have r^t = w^T ..." (w instead of x). - P. 497 l. -10: "a column of all 0s (or 1s)" -> of all -1s (or +1s). Also in "0101" and "1010". - P. 557: "reproducable".