#Here's the math practice example once more: > attach(math2) > math2 Time Practice Reward 1 51 massed none 2 96 massed none 3 97 massed none 4 22 massed none 5 35 massed none 6 36 massed none 7 28 massed none 8 76 massed none 9 39 spaced none 10 104 spaced none 11 130 spaced none 12 122 spaced none 13 114 spaced none 14 92 spaced none 15 87 spaced none 16 64 spaced none 17 42 none none 18 92 none none 19 156 none none 20 144 none none 21 133 none none 22 124 none none 23 68 none none 24 142 none none 25 26 massed small 26 41 massed small 27 28 massed small 28 92 massed small 29 14 massed small 30 16 massed small 31 29 massed small 32 31 massed small 33 41 spaced small 34 26 spaced small 35 19 spaced small 36 59 spaced small 37 82 spaced small 38 86 spaced small 39 45 spaced small 40 37 spaced small 41 36 none small 42 39 none small 43 59 none small 44 27 none small 45 87 none small 46 99 none small 47 126 none small 48 104 none small # Here, we repeat the creation of a dummy coding system for the # two-way ANOVA problem. It takes two dummy variables to identify # the three practice conditions: > d1 <- c(rep(1,8),rep(0,16),rep(1,8),rep(0,16)) > d2 <- c(rep(0,8),rep(1,8),rep(0,8)) > d2 <- c(d2,d2) > cbind(d1,d2) d1 d2 [1,] 1 0 [2,] 1 0 [3,] 1 0 [4,] 1 0 [5,] 1 0 [6,] 1 0 [7,] 1 0 [8,] 1 0 [9,] 0 1 [10,] 0 1 [11,] 0 1 [12,] 0 1 [13,] 0 1 [14,] 0 1 [15,] 0 1 [16,] 0 1 [17,] 0 0 [18,] 0 0 [19,] 0 0 [20,] 0 0 [21,] 0 0 [22,] 0 0 [23,] 0 0 [24,] 0 0 [25,] 1 0 [26,] 1 0 [27,] 1 0 [28,] 1 0 [29,] 1 0 [30,] 1 0 [31,] 1 0 [32,] 1 0 [33,] 0 1 [34,] 0 1 [35,] 0 1 [36,] 0 1 [37,] 0 1 [38,] 0 1 [39,] 0 1 [40,] 0 1 [41,] 0 0 [42,] 0 0 [43,] 0 0 [44,] 0 0 [45,] 0 0 [46,] 0 0 [47,] 0 0 [48,] 0 0 # It takes one dummy variable to identify the two reward conditions: > rdummy <- c(rep(0,24),rep(1,24)) # Here's the coding system for the two main effects: > cbind(d1,d2,rdummy) d1 d2 rdummy [1,] 1 0 0 [2,] 1 0 0 [3,] 1 0 0 [4,] 1 0 0 [5,] 1 0 0 [6,] 1 0 0 [7,] 1 0 0 [8,] 1 0 0 [9,] 0 1 0 [10,] 0 1 0 [11,] 0 1 0 [12,] 0 1 0 [13,] 0 1 0 [14,] 0 1 0 [15,] 0 1 0 [16,] 0 1 0 [17,] 0 0 0 [18,] 0 0 0 [19,] 0 0 0 [20,] 0 0 0 [21,] 0 0 0 [22,] 0 0 0 [23,] 0 0 0 [24,] 0 0 0 [25,] 1 0 1 [26,] 1 0 1 [27,] 1 0 1 [28,] 1 0 1 [29,] 1 0 1 [30,] 1 0 1 [31,] 1 0 1 [32,] 1 0 1 [33,] 0 1 1 [34,] 0 1 1 [35,] 0 1 1 [36,] 0 1 1 [37,] 0 1 1 [38,] 0 1 1 [39,] 0 1 1 [40,] 0 1 1 [41,] 0 0 1 [42,] 0 0 1 [43,] 0 0 1 [44,] 0 0 1 [45,] 0 0 1 [46,] 0 0 1 [47,] 0 0 1 [48,] 0 0 1 # The interaction is captured by multiplying each dummy variable # for the first factor by each (in this case, one) dummy variable # for the second factor: > i1 <- d1*rdummy > i2 <- d2*rdummy # Here's the ANOVA, letting R worry about the coding: > anova(lm(Time~Practice+Reward+Practice*Reward)) Analysis of Variance Table Response: Time Df Sum Sq Mean Sq F value Pr(>F) Practice 2 18150 9075.0 8.9331 0.0005854 *** Reward 1 14876 14875.5 14.6428 0.0004254 *** Practice:Reward 2 1332 666.0 0.6556 0.5243595 Residuals 42 42667 1015.9 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 # In response to a student question, we noted that the # order in which we list the factors doesn't matter: > anova(lm(Time~Reward+Practice+Practice*Reward)) Analysis of Variance Table Response: Time Df Sum Sq Mean Sq F value Pr(>F) Reward 1 14876 14875.5 14.6428 0.0004254 *** Practice 2 18150 9075.0 8.9331 0.0005854 *** Reward:Practice 2 1332 666.0 0.6556 0.5243595 Residuals 42 42667 1015.9 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 # Here's the ANOVA using the codes we created: > anova(lm(Time~d1+d2+rdummy+i1+i2)) Analysis of Variance Table Response: Time Df Sum Sq Mean Sq F value Pr(>F) d1 1 14726 14726.3 14.4959 0.0004507 *** d2 1 3424 3423.8 3.3702 0.0734709 . rdummy 1 14876 14875.5 14.6428 0.0004254 *** i1 1 1298 1298.0 1.2777 0.2647396 i2 1 34 34.0 0.0335 0.8556574 Residuals 42 42667 1015.9 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 # The results haven't been nicely divided into tests # about the two main effects and the interaction, but # the residual sum of squares is the same, so we know # that we have actually done the same ANOVA. # There are three tests that interest us: the test for # the main effect of practice, the test for the main effect # of reward, and the test for the interaction. We can isolate # those questions by doing nested F tests. # We start by considering the interaction. If we drop i1 and i2, # the variables that represent the interaction... > anova(lm(Time~d1+d2+rdummy)) Analysis of Variance Table Response: Time Df Sum Sq Mean Sq F value Pr(>F) d1 1 14726 14726.3 14.7265 0.0003935 *** d2 1 3424 3423.8 3.4238 0.0709836 . rdummy 1 14876 14875.5 14.8757 0.0003709 *** Residuals 44 43999 1000.0 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 # ...the residual (within groups) sum of squares changes. How # much does it change? > SSint <- 43999-42667 # Note that the SS change associated with dropping the interaction # variables is the same as the interaction SS we got when we let # R code the ANOVA: > SSint [1] 1332 # We dropped two predictors, so that's the df associated with # the change. Here's the mean square: > MSint <- SSint/2 > MSint/1015.9 [1] 0.6555763 # It's the same as the interaction mean square in R's version of the ANOVA, # and hence leads to the same F test. # If we now drop the two variables that code the practice main effect... > anova(lm(Time~rdummy)) Analysis of Variance Table Response: Time Df Sum Sq Mean Sq F value Pr(>F) rdummy 1 14876 14875.5 11.01 0.001777 ** Residuals 46 62149 1351.1 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 # ...the change in sum of squares is the same as the SS for the # main effect of practice: > 62149 - 43999 [1] 18150 # That ANOVA table also showed us the SS for the main effect # of reward, as that's the only predictor left in the model. # However, if we wanted to pursue the idea of thinking about # nested tests, we could observe that the change in residual # sum of squares when we move to a model with NO predictors # is the same as the reward main effect sum of squares: > anova(lm(Time~1)) Analysis of Variance Table Response: Time Df Sum Sq Mean Sq F value Pr(>F) Residuals 47 77025 1638.8 > 77025 - 62149 [1] 14876 > > > # My Kelly/Nils example considers midterm scores of male and # female students who have male (Nils) or female (Kelly) TAs. # The interaction plot (posted at the Virtual Whiteboard link # from the class web page) appears to show that female students # with the male TA perform much worse than anyone else. > kellynils <- read.csv(file.choose()) > attach(kellynils) # The ANOVA, though, suggests that there is no evidence of ANY # kind of effect: no interaction, no TA effect, and no student # gender effect: > anova(lm(Score~TASex+STSex+TASex*STSex)) Analysis of Variance Table Response: Score Df Sum Sq Mean Sq F value Pr(>F) TASex 1 118.1 118.08 0.4695 0.4998 STSex 1 67.6 67.58 0.2687 0.6089 TASex:STSex 1 126.4 126.44 0.5028 0.4851 Residuals 24 6035.6 251.49 # We moved on to consider assumption checking for the math practice # example. Here's an easy way to compare standard deviations across # cells. First, create a variable that identifies the cells: > c(rep(1,8),rep(2,8),rep(3,8),rep(4,8),rep(5,8),rep(6,8)) -> junk [1] 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 5 5 5 5 5 5 [39] 5 5 6 6 6 6 6 6 6 6 # Then use tapply() to compare sd's within the cells: > tapply(Time, junk, sd) 1 2 3 4 5 6 30.42761 30.60812 40.84443 24.69203 24.52368 36.78679 # A typical sd is the square root of the within-groups mean square # (taken from the earlier ANOVA tables): > sqrt( 1015.9) [1] 31.87319 # Taking normal samples of N=8 from a distribution with that sd # results in sample sd's that are more spread out than those in # out six cells, so we needn't worry about different variation # across the populations: > sd(rnorm(8, 0, 31.9)) [1] 37.9904 > sd(rnorm(8, 0, 31.9)) [1] 17.83066 > sd(rnorm(8, 0, 31.9)) [1] 32.12676 > sd(rnorm(8, 0, 31.9)) [1] 27.38573 > sd(rnorm(8, 0, 31.9)) [1] 37.79727 > sd(rnorm(8, 0, 31.9)) [1] 39.11975 > sd(rnorm(8, 0, 31.9)) [1] 26.59996 > sd(rnorm(8, 0, 31.9)) [1] 24.49621 > sd(rnorm(8, 0, 31.9)) [1] 34.60416 > sd(rnorm(8, 0, 31.9)) [1] 20.99517 > sd(rnorm(8, 0, 31.9)) [1] 39.41191 > sd(rnorm(8, 0, 31.9)) [1] 46.88346 # We could try to evaluate normality within each of those six # cells (here, we look at the first cell), but with the sample # size only equal to 8, there's not really much information about # whether the assumption is satisfied: > qqnorm(Time[1:8]);qqline(Time[1:8])