# The Eysenck data we have been working with actually was results # from a group of old people. Eysenck also collected data from young # people. Here is the data set: > Eysenck2 <- read.csv("http://faculty.ucmerced.edu/jvevea/classes/105/data/Eysenck2.csv") > Eysenck2 Words Group Age 1 9 Counting Old 2 8 Counting Old 3 6 Counting Old 4 8 Counting Old 5 10 Counting Old 6 4 Counting Old 7 6 Counting Old 8 5 Counting Old 9 7 Counting Old 10 7 Counting Old 11 7 Rhyming Old 12 9 Rhyming Old 13 6 Rhyming Old 14 6 Rhyming Old 15 6 Rhyming Old 16 11 Rhyming Old 17 6 Rhyming Old 18 3 Rhyming Old 19 8 Rhyming Old 20 7 Rhyming Old 21 11 Adjective Old 22 13 Adjective Old 23 8 Adjective Old 24 6 Adjective Old 25 14 Adjective Old 26 11 Adjective Old 27 13 Adjective Old 28 13 Adjective Old 29 10 Adjective Old 30 11 Adjective Old 31 12 Imagery Old 32 11 Imagery Old 33 16 Imagery Old 34 11 Imagery Old 35 9 Imagery Old 36 23 Imagery Old 37 12 Imagery Old 38 10 Imagery Old 39 19 Imagery Old 40 11 Imagery Old 41 10 Intentional Old 42 19 Intentional Old 43 14 Intentional Old 44 5 Intentional Old 45 10 Intentional Old 46 11 Intentional Old 47 14 Intentional Old 48 15 Intentional Old 49 11 Intentional Old 50 11 Intentional Old 51 8 Counting Young 52 6 Counting Young 53 4 Counting Young 54 6 Counting Young 55 7 Counting Young 56 6 Counting Young 57 5 Counting Young 58 7 Counting Young 59 9 Counting Young 60 7 Counting Young 61 10 Rhyming Young 62 7 Rhyming Young 63 8 Rhyming Young 64 10 Rhyming Young 65 4 Rhyming Young 66 7 Rhyming Young 67 10 Rhyming Young 68 6 Rhyming Young 69 7 Rhyming Young 70 7 Rhyming Young 71 14 Adjective Young 72 11 Adjective Young 73 18 Adjective Young 74 14 Adjective Young 75 13 Adjective Young 76 22 Adjective Young 77 17 Adjective Young 78 16 Adjective Young 79 12 Adjective Young 80 11 Adjective Young 81 20 Imagery Young 82 16 Imagery Young 83 16 Imagery Young 84 15 Imagery Young 85 18 Imagery Young 86 16 Imagery Young 87 20 Imagery Young 88 22 Imagery Young 89 14 Imagery Young 90 19 Imagery Young 91 21 Intentional Young 92 19 Intentional Young 93 17 Intentional Young 94 15 Intentional Young 95 22 Intentional Young 96 16 Intentional Young 97 22 Intentional Young 98 22 Intentional Young 99 18 Intentional Young 100 21 Intentional Young # Eysenck was actually interested in several questions: # -does level of processing affect memory? # -does age affect memory? # -does the effect of level of processing differ for the different ages? # (That last question is the same as saying "Is there an interaction # between age and level of processing?" # Two-way ANOVA addresses those three questions. Here's how to do it letting # R do the coding: > attach(Eysenck2) > anova(lm(Words~Group+Age+Group*Age)) Analysis of Variance Table Response: Words Df Sum Sq Mean Sq F value Pr(>F) Group 4 1514.94 378.74 47.1911 < 2.2e-16 *** Age 1 240.25 240.25 29.9356 3.981e-07 *** Group:Age 4 190.30 47.57 5.9279 0.0002793 *** Residuals 90 722.30 8.03 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 # The first F statistic addresses the question "Is there an effect of # level of processing, if we ignore the fact that there are two different # age groups?" (And the answer is YES, p < 2.2e-16.) # The second F statistic asks "Is there an effect of age, if we ignore # the fact that people were in groups with different levels of processing?" # Again, the answer is YES, p = 3.981e-07. # The third F statistic asks "Is there an interaction (i.e., does the effect # of level of processing differ for the two age groups)?" And again, the # answer is YES, p=.0002793. # An interaction plot (superimposing line plots for different levels of a # factor) is a useful way to visualize two-way ANOVA. There are packages that # can do such plots, but usually you'll get a nicer result if you do it yourself. # Here's an example: > par(pin=c(6,4)) > plot(c(0,7), c(5,22), type='n', xlab="Group",ylab="Mean",axes=FALSE) > box() > tapply(Words[1:50], Group[1:50], mean) Imagery Rhyming Counting Adjective Intentional 13.4 6.9 7.0 11.0 12.0 > Old <- c(7,6.9,11,13.4,12) > tapply(Words[51:100], Group[51:100], mean) Imagery Rhyming Counting Adjective Intentional 17.6 7.6 6.5 14.8 19.3 > Young <- c(6.5, 7.6, 14.8, 17.6, 19.3) > lines(1:5, Old, type='b') > lines(1:5, Young, type='b') > axis(side=1, at=1:5, lab=c("Counting","Rhyming","Adjective","Imagery","Intentional")) > axis(side=2, at=seq(5,20,5)) > text(5.2, 12, "Old", adj=0) > text(5.2, 19.3, "Young", adj=0) # Here's the ANOVA again: > anova(lm(Words~Group+Age+Group*Age)) Analysis of Variance Table Response: Words Df Sum Sq Mean Sq F value Pr(>F) Group 4 1514.94 378.74 47.1911 < 2.2e-16 *** Age 1 240.25 240.25 29.9356 3.981e-07 *** Group:Age 4 190.30 47.57 5.9279 0.0002793 *** Residuals 90 722.30 8.03 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 # In that table, the sums of squares for the main effects are the # same as what you would get if you simply ignored the other factor # and did a one-way ANOVA: > anova(lm(Words~Group)) Analysis of Variance Table Response: Words Df Sum Sq Mean Sq F value Pr(>F) Group 4 1514.9 378.74 31.209 < 2.2e-16 *** Residuals 95 1152.8 12.14 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > anova(lm(Words~Age)) Analysis of Variance Table Response: Words Df Sum Sq Mean Sq F value Pr(>F) Age 1 240.25 240.250 9.6989 0.002418 ** Residuals 98 2427.54 24.771 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 # (The 1514.9 vs 1514.94 is just rounding.) # For the interaction, we calculate a sum of squares across all # 10 cells of the design... > cell <- c(rep(1,10),rep(2,10),rep(3,10),rep(4,10),rep(5,10), + rep(6,10),rep(7,10),rep(8,10),rep(9,10),rep(10,10)) > table(cell) cell 1 2 3 4 5 6 7 8 9 10 10 10 10 10 10 10 10 10 10 10 > anova(lm(Words~factor(cell))) Analysis of Variance Table Response: Words Df Sum Sq Mean Sq F value Pr(>F) factor(cell) 9 1945.5 216.166 26.935 < 2.2e-16 *** Residuals 90 722.3 8.026 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 # That represents the total variation in means, but we have already # attributed some of that variation to the mean effects. So if we # subtract the main effect sums of squares, we are left with the # interaction sum of squares: > 1945.5 - 240.25 - 1514.94 [1] 190.31 # (Again, the 190.3 vs 190.31 is just rounding error.) # If we wanted to code the ANOVA ourselves, we would start by creating # a dummy coding system for each factor. Here's a dummy coding system # for level of processing: > g1 <- c(rep(1,10),rep(0,40),rep(1,10),rep(0,40)) > g2 <- c(rep(0,10),rep(1,10),rep(0,30),rep(0,10),rep(1,10),rep(0,30)) > g3 <- c(rep(0,20),rep(1,10),rep(0,20),rep(0,20),rep(1,10),rep(0,20)) > g4 <- c(rep(0,30),rep(1,10),rep(0,10),rep(0,30),rep(1,10),rep(0,10)) # Note that each strategy group is identified by a unique pattern: > cbind(g1,g2,g3,g4) g1 g2 g3 g4 [1,] 1 0 0 0 [2,] 1 0 0 0 [3,] 1 0 0 0 [4,] 1 0 0 0 [5,] 1 0 0 0 [6,] 1 0 0 0 [7,] 1 0 0 0 [8,] 1 0 0 0 [9,] 1 0 0 0 [10,] 1 0 0 0 [11,] 0 1 0 0 [12,] 0 1 0 0 [13,] 0 1 0 0 [14,] 0 1 0 0 [15,] 0 1 0 0 [16,] 0 1 0 0 [17,] 0 1 0 0 [18,] 0 1 0 0 [19,] 0 1 0 0 [20,] 0 1 0 0 [21,] 0 0 1 0 [22,] 0 0 1 0 [23,] 0 0 1 0 [24,] 0 0 1 0 [25,] 0 0 1 0 [26,] 0 0 1 0 [27,] 0 0 1 0 [28,] 0 0 1 0 [29,] 0 0 1 0 [30,] 0 0 1 0 [31,] 0 0 0 1 [32,] 0 0 0 1 [33,] 0 0 0 1 [34,] 0 0 0 1 [35,] 0 0 0 1 [36,] 0 0 0 1 [37,] 0 0 0 1 [38,] 0 0 0 1 [39,] 0 0 0 1 [40,] 0 0 0 1 [41,] 0 0 0 0 [42,] 0 0 0 0 [43,] 0 0 0 0 [44,] 0 0 0 0 [45,] 0 0 0 0 [46,] 0 0 0 0 [47,] 0 0 0 0 [48,] 0 0 0 0 [49,] 0 0 0 0 [50,] 0 0 0 0 [51,] 1 0 0 0 [52,] 1 0 0 0 [53,] 1 0 0 0 [54,] 1 0 0 0 [55,] 1 0 0 0 [56,] 1 0 0 0 [57,] 1 0 0 0 [58,] 1 0 0 0 [59,] 1 0 0 0 [60,] 1 0 0 0 [61,] 0 1 0 0 [62,] 0 1 0 0 [63,] 0 1 0 0 [64,] 0 1 0 0 [65,] 0 1 0 0 [66,] 0 1 0 0 [67,] 0 1 0 0 [68,] 0 1 0 0 [69,] 0 1 0 0 [70,] 0 1 0 0 [71,] 0 0 1 0 [72,] 0 0 1 0 [73,] 0 0 1 0 [74,] 0 0 1 0 [75,] 0 0 1 0 [76,] 0 0 1 0 [77,] 0 0 1 0 [78,] 0 0 1 0 [79,] 0 0 1 0 [80,] 0 0 1 0 [81,] 0 0 0 1 [82,] 0 0 0 1 [83,] 0 0 0 1 [84,] 0 0 0 1 [85,] 0 0 0 1 [86,] 0 0 0 1 [87,] 0 0 0 1 [88,] 0 0 0 1 [89,] 0 0 0 1 [90,] 0 0 0 1 [91,] 0 0 0 0 [92,] 0 0 0 0 [93,] 0 0 0 0 [94,] 0 0 0 0 [95,] 0 0 0 0 [96,] 0 0 0 0 [97,] 0 0 0 0 [98,] 0 0 0 0 [99,] 0 0 0 0 [100,] 0 0 0 0 # And here's a dummy code for age: > a1 <- c(rep(1,50),rep(0,50)) # We capture the interaction by taking the cross products # of the variables in the two dummy coding systems: > i1 <- g1*a1 > i2 <- g2*a1 > i3 <- g3*a1 > i4 <- g4*a1 # If we regress on those nine variables, we get a model that # has the same residual sum of squares as the two-way ANOVA: > anova(lm(Words~g1+g2+g3+g4+a1+i1+i2+i3+i4)) Analysis of Variance Table Response: Words Df Sum Sq Mean Sq F value Pr(>F) g1 1 590.49 590.49 73.5762 2.600e-13 *** g2 1 828.82 828.82 103.2722 < 2.2e-16 *** g3 1 95.41 95.41 11.8881 0.0008611 *** g4 1 0.22 0.22 0.0280 0.8674012 a1 1 240.25 240.25 29.9356 3.981e-07 *** i1 1 81.00 81.00 10.0928 0.0020398 ** i2 1 72.60 72.60 9.0461 0.0034129 ** i3 1 12.67 12.67 1.5793 0.2121092 i4 1 24.03 24.03 2.9936 0.0870238 . Residuals 90 722.30 8.03 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > anova(lm(Words~Group+Age+Group*Age)) Analysis of Variance Table Response: Words Df Sum Sq Mean Sq F value Pr(>F) Group 4 1514.94 378.74 47.1911 < 2.2e-16 *** Age 1 240.25 240.25 29.9356 3.981e-07 *** Group:Age 4 190.30 47.57 5.9279 0.0002793 *** Residuals 90 722.30 8.03 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 # But how do we divide the variability into parts associated # with the interaction and with each main effect? We sequentially # drop variables and see how the residual sum of squares changes. # First, we drop the interaction variables: > anova(lm(Words~g1+g2+g3+g4+a1)) Analysis of Variance Table Response: Words Df Sum Sq Mean Sq F value Pr(>F) g1 1 590.49 590.49 60.8219 8.419e-12 *** g2 1 828.82 828.82 85.3701 7.609e-15 *** g3 1 95.41 95.41 9.8273 0.002295 ** g4 1 0.22 0.22 0.0232 0.879328 a1 1 240.25 240.25 24.7463 2.943e-06 *** Residuals 94 912.60 9.71 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 # The residual sum of squares has changed by an amount equal to # the interaction sum of squares: > 912.6-722.3 [1] 190.3 # Then we drop the age variable: > anova(lm(Words~g1+g2+g3+g4)) Analysis of Variance Table Response: Words Df Sum Sq Mean Sq F value Pr(>F) g1 1 590.49 590.49 48.6590 4.048e-10 *** g2 1 828.82 828.82 68.2982 8.326e-13 *** g3 1 95.41 95.41 7.8621 0.006121 ** g4 1 0.22 0.22 0.0185 0.891979 Residuals 95 1152.85 12.14 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 # The residual sum of squares has changed by an amount equal to the # age sum of squares: > 1152.85-912.6 [1] 240.25 # Finally we drop the four variables that define the strategy factor: > anova(lm(Words~1)) Analysis of Variance Table Response: Words Df Sum Sq Mean Sq F value Pr(>F) Residuals 99 2667.8 26.947 # The residual sum of squares has changed by an amount equal to the # strategy sum of squares (with a little rounding error): > 2667.8-1152.85 [1] 1514.95 # To check the assumptions of equal variability and normality within # each of the 10 populations represented by our crossed factors, we can # use the "cell" variable we defined before: > table(cell) cell 1 2 3 4 5 6 7 8 9 10 10 10 10 10 10 10 10 10 10 10 # The standard deviations in all 10 groups look pretty similar: > tapply(Words, cell, sd) 1 2 3 4 5 6 7 8 1.825742 2.131770 2.494438 4.501851 3.741657 1.433721 1.955050 3.489667 9 10 2.590581 2.668749 # We could also check normality in each cell. Here, we look at three of the # 10; but it's pretty futile to try to assess normality with only 10 observations: > boxplot(Words~cell) > qqnorm(Words[1:10]); qqline(Words[1:10]) > qqnorm(Words[51:60]); qqline(Words[51:60]) > qqnorm(Words[61:70]); qqline(Words[61:70]) > # This idea of looking at how a model changes when we move variables in and out # of the predictor set can be pretty useful. For example, if we include Sex and # an interaction between Sex and Raven, we can do a single multiple regression that # actually estimates a separate regression for boys and girls. > detach(Eysenck2) > attach(JackStatlab) # Here's the overall regression of Peabody on Raven: > summary(lm(CTPEA~CTRA)) Call: lm(formula = CTPEA ~ CTRA) Residuals: Min 1Q Median 3Q Max -26.520 -8.273 -1.516 6.915 42.459 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 59.0490 5.9663 9.897 3.55e-13 *** CTRA 0.7319 0.1856 3.942 0.000262 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 11.87 on 48 degrees of freedom Multiple R-squared: 0.2446, Adjusted R-squared: 0.2288 F-statistic: 15.54 on 1 and 48 DF, p-value: 0.0002616 > table(CBSEX) CBSEX 0 1 26 24 # Here's the regression for the girls: > lm(CTPEA[1:26]~CTRA[1:26]) Call: lm(formula = CTPEA[1:26] ~ CTRA[1:26]) Coefficients: (Intercept) CTRA[1:26] 53.3569 0.7559 # And for the boys: > lm(CTPEA[27:50]~CTRA[27:50]) Call: lm(formula = CTPEA[27:50] ~ CTRA[27:50]) Coefficients: (Intercept) CTRA[27:50] 65.5195 0.6958 # The two regressions look pretty different. # Here's the sex dummy variable: > CBSEX [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 [39] 1 1 1 1 1 1 1 1 1 1 1 1 # If we create an interaction by multiplying sex by raven, we get a variable # that has the value 0 for girls and Raven for boys: > int <- CBSEX*CTRA > int [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [26] 0 21 28 38 28 31 39 21 37 21 22 23 18 28 39 42 26 33 41 34 35 30 33 28 42 # And now if we regress Peabody on Raven, sex, and the interaction, we effectively # do both regressions in one step: > summary(lm(CTPEA~CTRA+CBSEX+int)) Call: lm(formula = CTPEA ~ CTRA + CBSEX + int) Residuals: Min 1Q Median 3Q Max -28.657 -5.926 -0.632 5.694 36.997 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 53.35693 6.67198 7.997 2.99e-10 *** CTRA 0.75587 0.20446 3.697 0.00058 *** CBSEX 12.16261 11.74089 1.036 0.30566 int -0.06003 0.36784 -0.163 0.87108 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 10.87 on 46 degrees of freedom Multiple R-squared: 0.3933, Adjusted R-squared: 0.3537 F-statistic: 9.938 on 3 and 46 DF, p-value: 3.633e-05 # Here's the intercept for the girls: > 53.35693 [1] 53.35693 # And for the boys: > 53.35693 + 12.16261 [1] 65.51954 # The slope for the girls is .75587, but for the boys, it's > 0.75587 -0.06003 [1] 0.69584 # Note that those intercepts and slopes match the ones we got # when we did separate regressions. # Now we need the error mean square from the complex model: > anova(lm(CTPEA~CTRA+CBSEX+int)) Analysis of Variance Table Response: CTPEA Df Sum Sq Mean Sq F value Pr(>F) CTRA 1 2189.0 2189.00 18.5431 8.625e-05 *** CBSEX 1 1327.4 1327.36 11.2441 0.001606 ** int 1 3.1 3.14 0.0266 0.871083 Residuals 46 5430.3 118.05 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 # Here's the model that doesn't distinguish between boys and girls: > anova(lm(CTPEA~CTRA)) Analysis of Variance Table Response: CTPEA Df Sum Sq Mean Sq F value Pr(>F) CTRA 1 2189.0 2189.00 15.541 0.0002616 *** Residuals 48 6760.8 140.85 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 # The error sum of squares has changes, and there's been a 2 df change, # so the mean square for change is: > MSchange <- (6760.8-5430.3) / 2 > MSchange [1] 665.25 # We can calculate an F statistic that asks if the two regressions differ, # using the error mean square from the complex regression: > Fchange <- 665.25 / 118.05 > Fchange [1] 5.635324 # And we find that the regression IS different for boys and girls: > 1-pf(Fchange, 2, 46) [1] 0.006471305 >