# Here's that Eysenck memory data set again: > attach(Eysenck) > Eysenck Score Group 1 9 counting 2 8 counting 3 6 counting 4 8 counting 5 10 counting 6 4 counting 7 6 counting 8 5 counting 9 7 counting 10 7 counting 11 7 rhyming 12 9 rhyming 13 6 rhyming 14 6 rhyming 15 6 rhyming 16 11 rhyming 17 6 rhyming 18 3 rhyming 19 8 rhyming 20 7 rhyming 21 11 adjective 22 13 adjective 23 8 adjective 24 6 adjective 25 14 adjective 26 11 adjective 27 13 adjective 28 13 adjective 29 10 adjective 30 11 adjective 31 12 imagery 32 11 imagery 33 16 imagery 34 11 imagery 35 9 imagery 36 23 imagery 37 12 imagery 38 10 imagery 39 19 imagery 40 11 imagery 41 10 intent 42 19 intent 43 14 intent 44 5 intent 45 10 intent 46 11 intent 47 14 intent 48 15 intent 49 11 intent 50 11 intent # If we visualize the groups using parallel boxplots, R will put # the groups in alphabetical order, which is not how we want to # think about them: > boxplot(Score~Group) # We can get around that by creating a "group" variable that R # will see in the same order that we want: > group <- c(rep(1,10),rep(2,10),rep(3,10),rep(4,10),rep(5,10)) > boxplot(Score~group) # But now we don't have good axis labels. We can correct that by # suppressing the axes... > par(pin=.8*c(6,4)) > boxplot(Score~group,xlab="Group",axes=FALSE) # We add back the vertical axis and frame the plot: > axis(side=2,at=c(5,10,15,20)) > box() # And now we make a lucky guess that the positions of the plots # are at locations 1, 2, 3, 4, and 5 on the X-axis, and we add # informative labels: > axis(side=1,at=1:5,lab=c("Counting","Rhyming","Adjective","Imagery","Intentional")) # One shortcoming of boxplots to depict the ANOVA is that they # center on the median, not the mean. But it's easy to add the # means to the plot: > points(1:5,tapply(Score,group,mean)) # A barplot is another way to show the means. (These are often # done with whiskers to indicate one standard error or one standard # deviation, but in R that requires a special package.) > barplot(tapply(Score,group,mean)) > abline(h=0) # (We could use the same approach to improve labeling that we did # above with the boxplots.) # Another way to depict the means is with a line plot: > plot(1:5,tapply(Score,group,mean),type='b') # (There's a lot we could have done using familiar techniques to # improve that plot: meaningful labels, more white space, etc. But # we didn't take the time.) # A student asked if we could add inferential information # to the plot. Here's the ANOVA: > anova(lm(Score~Group)) Analysis of Variance Table Response: Score Df Sum Sq Mean Sq F value Pr(>F) Group 4 351.52 87.880 9.0848 1.815e-05 *** Residuals 45 435.30 9.673 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 # We add the F statistic and p value to the plot: > text(4,7.5,"F(4,45) = 9.08, p < .001") # One post hoc test that is available in base R is # Tukey's HSD: > TukeyHSD(aov(lm(Score~Group))) Tukey multiple comparisons of means 95% family-wise confidence level Fit: aov(formula = lm(Score ~ Group)) $Group diff lwr upr p adj counting-adjective -4.0 -7.952238 -0.04776201 0.0460196 imagery-adjective 2.4 -1.552238 6.35223799 0.4291513 intent-adjective 1.0 -2.952238 4.95223799 0.9510451 rhyming-adjective -4.1 -8.052238 -0.14776201 0.0385792 imagery-counting 6.4 2.447762 10.35223799 0.0003180 intent-counting 5.0 1.047762 8.95223799 0.0068354 rhyming-counting -0.1 -4.052238 3.85223799 0.9999937 intent-imagery -1.4 -5.352238 2.55223799 0.8510119 rhyming-imagery -6.5 -10.452238 -2.54776201 0.0002524 rhyming-intent -5.1 -9.052238 -1.14776201 0.0055623 # Many other post hoc procedures are available in the package "DescTools," # but the package installation hung when I attempted to demonstrate: > utils:::menuInstallPkgs() --- Please select a CRAN mirror for use in this session --- also installing the dependencies ‘proxy’, ‘rootSolve’, ‘e1071’, ‘lmom’, ‘mvtnorm’, ‘expm’, ‘Rcpp’, ‘rstudioapi’, ‘Exact’, ‘gld’, ‘data.table’, ‘BH’ trying URL 'https://ftp.osuosl.org/pub/cran/bin/windows/contrib/4.0/proxy_0.4-25.zip' Content type 'application/zip' length 244671 bytes (238 KB) downloaded 238 KB trying URL 'https://ftp.osuosl.org/pub/cran/bin/windows/contrib/4.0/rootSolve_1.8.2.1.zip' Content type 'application/zip' length 789394 bytes (770 KB) downloaded 770 KB trying URL 'https://ftp.osuosl.org/pub/cran/bin/windows/contrib/4.0/e1071_1.7-6.zip' Content type 'application/zip' length 1022784 bytes (998 KB) downloaded 998 KB trying URL 'https://ftp.osuosl.org/pub/cran/bin/windows/contrib/4.0/lmom_2.8.zip' Content type 'application/zip' length 592986 bytes (579 KB) downloaded 579 KB trying URL 'https://ftp.osuosl.org/pub/cran/bin/windows/contrib/4.0/mvtnorm_1.1-1.zip' Content type 'application/zip' length 268952 bytes (262 KB) downloaded 262 KB trying URL 'https://ftp.osuosl.org/pub/cran/bin/windows/contrib/4.0/expm_0.999-6.zip' Content type 'application/zip' length 239114 bytes (233 KB) downloaded 233 KB trying URL 'https://ftp.osuosl.org/pub/cran/bin/windows/contrib/4.0/Rcpp_1.0.6.zip' Content type 'application/zip' length 3253620 bytes (3.1 MB) downloaded 3.1 MB trying URL 'https://ftp.osuosl.org/pub/cran/bin/windows/contrib/4.0/rstudioapi_0.13.zip' Content type 'application/zip' length 302884 bytes (295 KB) downloaded 295 KB trying URL 'https://ftp.osuosl.org/pub/cran/bin/windows/contrib/4.0/Exact_2.1.zip' Content type 'application/zip' length 107178 bytes (104 KB) downloaded 104 KB trying URL 'https://ftp.osuosl.org/pub/cran/bin/windows/contrib/4.0/gld_2.6.2.zip' Content type 'application/zip' length 251894 bytes (245 KB) downloaded 245 KB trying URL 'https://ftp.osuosl.org/pub/cran/bin/windows/contrib/4.0/data.table_1.14.0.zip' Content type 'application/zip' length 2602848 bytes (2.5 MB) downloaded 2.5 MB trying URL 'https://ftp.osuosl.org/pub/cran/bin/windows/contrib/4.0/BH_1.75.0-0.zip' Content type 'application/zip' length 19674958 bytes (18.8 MB) downloaded 18.8 MB trying URL 'https://ftp.osuosl.org/pub/cran/bin/windows/contrib/4.0/DescTools_0.99.41.zip' Content type 'application/zip' length 5984299 bytes (5.7 MB) downloaded 5.7 MB package ‘proxy’ successfully unpacked and MD5 sums checked package ‘rootSolve’ successfully unpacked and MD5 sums checked package ‘e1071’ successfully unpacked and MD5 sums checked package ‘lmom’ successfully unpacked and MD5 sums checked package ‘mvtnorm’ successfully unpacked and MD5 sums checked package ‘expm’ successfully unpacked and MD5 sums checked package ‘Rcpp’ successfully unpacked and MD5 sums checked package ‘rstudioapi’ successfully unpacked and MD5 sums checked package ‘Exact’ successfully unpacked and MD5 sums checked package ‘gld’ successfully unpacked and MD5 sums checked package ‘data.table’ successfully unpacked and MD5 sums checked > library(DescTools) Error in library(DescTools) : there is no package called ‘DescTools’ # Here, we create a system of four dummy codes that allow # us to implement the ANOVA as a multiple regression problem: > d1 <- c(rep(1,10),rep(0,40)) > d2 <- c(rep(0,10),rep(1,10),rep(0,30)) > d3 <- c(rep(0,20),rep(1,10),rep(0,20)) > d4 <- c(rep(0,30),rep(1,10),rep(0,10)) > cbind(Score,d1,d2,d3,d4) Score d1 d2 d3 d4 [1,] 9 1 0 0 0 [2,] 8 1 0 0 0 [3,] 6 1 0 0 0 [4,] 8 1 0 0 0 [5,] 10 1 0 0 0 [6,] 4 1 0 0 0 [7,] 6 1 0 0 0 [8,] 5 1 0 0 0 [9,] 7 1 0 0 0 [10,] 7 1 0 0 0 [11,] 7 0 1 0 0 [12,] 9 0 1 0 0 [13,] 6 0 1 0 0 [14,] 6 0 1 0 0 [15,] 6 0 1 0 0 [16,] 11 0 1 0 0 [17,] 6 0 1 0 0 [18,] 3 0 1 0 0 [19,] 8 0 1 0 0 [20,] 7 0 1 0 0 [21,] 11 0 0 1 0 [22,] 13 0 0 1 0 [23,] 8 0 0 1 0 [24,] 6 0 0 1 0 [25,] 14 0 0 1 0 [26,] 11 0 0 1 0 [27,] 13 0 0 1 0 [28,] 13 0 0 1 0 [29,] 10 0 0 1 0 [30,] 11 0 0 1 0 [31,] 12 0 0 0 1 [32,] 11 0 0 0 1 [33,] 16 0 0 0 1 [34,] 11 0 0 0 1 [35,] 9 0 0 0 1 [36,] 23 0 0 0 1 [37,] 12 0 0 0 1 [38,] 10 0 0 0 1 [39,] 19 0 0 0 1 [40,] 11 0 0 0 1 [41,] 10 0 0 0 0 [42,] 19 0 0 0 0 [43,] 14 0 0 0 0 [44,] 5 0 0 0 0 [45,] 10 0 0 0 0 [46,] 11 0 0 0 0 [47,] 14 0 0 0 0 [48,] 15 0 0 0 0 [49,] 11 0 0 0 0 [50,] 11 0 0 0 0 # If we look at the coefficient estimates, we see that they # give back the same group means we've seen before. For example, # the conditional mean for the Counting group is # 12 + 1*(-5) + 0*(-5.1) + 0*(-1) + 0*1.4 = 7.0. > lm(Score~d1+d2+d3+d4) Call: lm(formula = Score ~ d1 + d2 + d3 + d4) Coefficients: (Intercept) d1 d2 d3 d4 12.0 -5.0 -5.1 -1.0 1.4 > tapply(Score,Group,mean) adjective counting imagery intent rhyming 11.0 7.0 13.4 12.0 6.9 # And here, we get the same ANOVA result as before. (It's important # to remember that we don't interpret the individual coefficient # tests in this output, just the overall F of 9.085 that appears at # the bottom of the output.) > summary(lm(Score~d1+d2+d3+d4)) Call: lm(formula = Score ~ d1 + d2 + d3 + d4) Residuals: Min 1Q Median 3Q Max -7.00 -1.85 -0.45 2.00 9.60 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 12.0000 0.9835 12.201 7.18e-16 *** d1 -5.0000 1.3909 -3.595 0.000802 *** d2 -5.1000 1.3909 -3.667 0.000647 *** d3 -1.0000 1.3909 -0.719 0.475890 d4 1.4000 1.3909 1.007 0.319544 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 3.11 on 45 degrees of freedom Multiple R-squared: 0.4468, Adjusted R-squared: 0.3976 F-statistic: 9.085 on 4 and 45 DF, p-value: 1.815e-05 # We confirm that we have a match to our previous output: > anova(lm(Score~Group)) Analysis of Variance Table Response: Score Df Sum Sq Mean Sq F value Pr(>F) Group 4 351.52 87.880 9.0848 1.815e-05 *** Residuals 45 435.30 9.673 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 # An "effects coding" system replaces the codes for the last # group (all zeroes under dummy coding) with all -1's: > e1 <- d1; e2 <- d2; e3 <- d3; e4 <- d4 > e1[41:50] <- -1 > e2[41:50] <- -1 > e3[41:50] <- -1 > e4[41:50] <- -1 > cbind(Score,e1,e2,e3,e4) Score e1 e2 e3 e4 [1,] 9 1 0 0 0 [2,] 8 1 0 0 0 [3,] 6 1 0 0 0 [4,] 8 1 0 0 0 [5,] 10 1 0 0 0 [6,] 4 1 0 0 0 [7,] 6 1 0 0 0 [8,] 5 1 0 0 0 [9,] 7 1 0 0 0 [10,] 7 1 0 0 0 [11,] 7 0 1 0 0 [12,] 9 0 1 0 0 [13,] 6 0 1 0 0 [14,] 6 0 1 0 0 [15,] 6 0 1 0 0 [16,] 11 0 1 0 0 [17,] 6 0 1 0 0 [18,] 3 0 1 0 0 [19,] 8 0 1 0 0 [20,] 7 0 1 0 0 [21,] 11 0 0 1 0 [22,] 13 0 0 1 0 [23,] 8 0 0 1 0 [24,] 6 0 0 1 0 [25,] 14 0 0 1 0 [26,] 11 0 0 1 0 [27,] 13 0 0 1 0 [28,] 13 0 0 1 0 [29,] 10 0 0 1 0 [30,] 11 0 0 1 0 [31,] 12 0 0 0 1 [32,] 11 0 0 0 1 [33,] 16 0 0 0 1 [34,] 11 0 0 0 1 [35,] 9 0 0 0 1 [36,] 23 0 0 0 1 [37,] 12 0 0 0 1 [38,] 10 0 0 0 1 [39,] 19 0 0 0 1 [40,] 11 0 0 0 1 [41,] 10 -1 -1 -1 -1 [42,] 19 -1 -1 -1 -1 [43,] 14 -1 -1 -1 -1 [44,] 5 -1 -1 -1 -1 [45,] 10 -1 -1 -1 -1 [46,] 11 -1 -1 -1 -1 [47,] 14 -1 -1 -1 -1 [48,] 15 -1 -1 -1 -1 [49,] 11 -1 -1 -1 -1 [50,] 11 -1 -1 -1 -1 # We still get the same F statistic. But now, if we have # the same sample size in each group (10 in this case), the # intercept represents the grand mean of all 50 scores, and # the slopes are deviations from that grand mean (the "effect" # of being in the particular group: > summary(lm(Score~e1+e2+e3+e4)) Call: lm(formula = Score ~ e1 + e2 + e3 + e4) Residuals: Min 1Q Median 3Q Max -7.00 -1.85 -0.45 2.00 9.60 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 10.0600 0.4398 22.872 < 2e-16 *** e1 -3.0600 0.8797 -3.478 0.001131 ** e2 -3.1600 0.8797 -3.592 0.000808 *** e3 0.9400 0.8797 1.069 0.290971 e4 3.3400 0.8797 3.797 0.000436 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 3.11 on 45 degrees of freedom Multiple R-squared: 0.4468, Adjusted R-squared: 0.3976 F-statistic: 9.085 on 4 and 45 DF, p-value: 1.815e-05 > mean(Score) [1] 10.06 # If we want the effect of the last group, we can get it # by taking -1 times the sum of the other effects: > -(-3.06-3.16+0.94+3.34) [1] 1.94 > 10.06+1.94 [1] 12 # As was the case for the t test, ANY set of number-of-groups-minus-one # predictors that identifies the groups with unique patterns will get # us the same ANOVA: > n1 <- c(rep(pi,10),rep(33,10),rep(4273,10),rep(99,10),rep(3333,10)) > n2 <- c(rep(33,10),rep(4273,10),rep(99,10),rep(pi,10),rep(12,10)) > n3 <- d3 > n4 <- e4 > summary(lm(Score~n1+n2+n3+n4)) Call: lm(formula = Score ~ n1 + n2 + n3 + n4) Residuals: Min 1Q Median 3Q Max -7.00 -1.85 -0.45 2.00 9.60 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 6.991e+00 9.929e-01 7.041 8.86e-09 *** n1 3.327e-03 7.060e-04 4.713 2.38e-05 *** n2 -4.701e-05 3.252e-04 -0.145 0.885688 n3 -1.020e+01 2.493e+00 -4.092 0.000175 *** n4 6.080e+00 1.337e+00 4.546 4.10e-05 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 3.11 on 45 degrees of freedom Multiple R-squared: 0.4468, Adjusted R-squared: 0.3976 F-statistic: 9.085 on 4 and 45 DF, p-value: 1.815e-05 >