# Here's that Eysenck memory data set again:

> attach(Eysenck)
> Eysenck
   Score     Group
1      9  counting
2      8  counting
3      6  counting
4      8  counting
5     10  counting
6      4  counting
7      6  counting
8      5  counting
9      7  counting
10     7  counting
11     7   rhyming
12     9   rhyming
13     6   rhyming
14     6   rhyming
15     6   rhyming
16    11   rhyming
17     6   rhyming
18     3   rhyming
19     8   rhyming
20     7   rhyming
21    11 adjective
22    13 adjective
23     8 adjective
24     6 adjective
25    14 adjective
26    11 adjective
27    13 adjective
28    13 adjective
29    10 adjective
30    11 adjective
31    12   imagery
32    11   imagery
33    16   imagery
34    11   imagery
35     9   imagery
36    23   imagery
37    12   imagery
38    10   imagery
39    19   imagery
40    11   imagery
41    10    intent
42    19    intent
43    14    intent
44     5    intent
45    10    intent
46    11    intent
47    14    intent
48    15    intent
49    11    intent
50    11    intent

# If we visualize the groups using parallel boxplots, R will put
# the groups in alphabetical order, which is not how we want to
# think about them:

> boxplot(Score~Group)

# We can get around that by creating a "group" variable that R
# will see in the same order that we want:

> group <- c(rep(1,10),rep(2,10),rep(3,10),rep(4,10),rep(5,10))
> boxplot(Score~group)

# But now we don't have good axis labels. We can correct that by
# suppressing the axes...

> par(pin=.8*c(6,4))
> boxplot(Score~group,xlab="Group",axes=FALSE)

# We add back the vertical axis and frame the plot:
> axis(side=2,at=c(5,10,15,20))
> box()

# And now we make a lucky guess that the positions of the plots
# are at locations 1, 2, 3, 4, and 5 on the X-axis, and we add
# informative labels:

> axis(side=1,at=1:5,lab=c("Counting","Rhyming","Adjective","Imagery","Intentional"))

# One shortcoming of boxplots to depict the ANOVA is that they
# center on the median, not the mean. But it's easy to add the
# means to the plot:

> points(1:5,tapply(Score,group,mean))

# A barplot is another way to show the means. (These are often
# done with whiskers to indicate one standard error or one standard
# deviation, but in R that requires a special package.)

> barplot(tapply(Score,group,mean))
> abline(h=0)

# (We could use the same approach to improve labeling that we did
# above with the boxplots.)

# Another way to depict the means is with a line plot:

> plot(1:5,tapply(Score,group,mean),type='b')

# (There's a lot we could have done using familiar techniques to
# improve that plot: meaningful labels, more white space, etc. But
# we didn't take the time.)


# A student asked if we could add inferential information
# to the plot. Here's the ANOVA:

> anova(lm(Score~Group))
Analysis of Variance Table

Response: Score
          Df Sum Sq Mean Sq F value    Pr(>F)    
Group      4 351.52  87.880  9.0848 1.815e-05 ***
Residuals 45 435.30   9.673                      
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

# We add the F statistic and p value to the plot:

> text(4,7.5,"F(4,45) = 9.08, p < .001")

# One post hoc test that is available in base R is
# Tukey's HSD:

> TukeyHSD(aov(lm(Score~Group)))
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = lm(Score ~ Group))

$Group
                   diff        lwr         upr     p adj
counting-adjective -4.0  -7.952238 -0.04776201 0.0460196
imagery-adjective   2.4  -1.552238  6.35223799 0.4291513
intent-adjective    1.0  -2.952238  4.95223799 0.9510451
rhyming-adjective  -4.1  -8.052238 -0.14776201 0.0385792
imagery-counting    6.4   2.447762 10.35223799 0.0003180
intent-counting     5.0   1.047762  8.95223799 0.0068354
rhyming-counting   -0.1  -4.052238  3.85223799 0.9999937
intent-imagery     -1.4  -5.352238  2.55223799 0.8510119
rhyming-imagery    -6.5 -10.452238 -2.54776201 0.0002524
rhyming-intent     -5.1  -9.052238 -1.14776201 0.0055623

# Many other post hoc procedures are available in the package "DescTools,"
# but the package installation hung when I attempted to demonstrate:

> utils:::menuInstallPkgs()
--- Please select a CRAN mirror for use in this session ---
also installing the dependencies ‘proxy’, ‘rootSolve’, ‘e1071’, ‘lmom’, ‘mvtnorm’, ‘expm’, ‘Rcpp’, ‘rstudioapi’, ‘Exact’, ‘gld’, ‘data.table’, ‘BH’

trying URL 'https://ftp.osuosl.org/pub/cran/bin/windows/contrib/4.0/proxy_0.4-25.zip'
Content type 'application/zip' length 244671 bytes (238 KB)
downloaded 238 KB

trying URL 'https://ftp.osuosl.org/pub/cran/bin/windows/contrib/4.0/rootSolve_1.8.2.1.zip'
Content type 'application/zip' length 789394 bytes (770 KB)
downloaded 770 KB

trying URL 'https://ftp.osuosl.org/pub/cran/bin/windows/contrib/4.0/e1071_1.7-6.zip'
Content type 'application/zip' length 1022784 bytes (998 KB)
downloaded 998 KB

trying URL 'https://ftp.osuosl.org/pub/cran/bin/windows/contrib/4.0/lmom_2.8.zip'
Content type 'application/zip' length 592986 bytes (579 KB)
downloaded 579 KB

trying URL 'https://ftp.osuosl.org/pub/cran/bin/windows/contrib/4.0/mvtnorm_1.1-1.zip'
Content type 'application/zip' length 268952 bytes (262 KB)
downloaded 262 KB

trying URL 'https://ftp.osuosl.org/pub/cran/bin/windows/contrib/4.0/expm_0.999-6.zip'
Content type 'application/zip' length 239114 bytes (233 KB)
downloaded 233 KB

trying URL 'https://ftp.osuosl.org/pub/cran/bin/windows/contrib/4.0/Rcpp_1.0.6.zip'
Content type 'application/zip' length 3253620 bytes (3.1 MB)
downloaded 3.1 MB

trying URL 'https://ftp.osuosl.org/pub/cran/bin/windows/contrib/4.0/rstudioapi_0.13.zip'
Content type 'application/zip' length 302884 bytes (295 KB)
downloaded 295 KB

trying URL 'https://ftp.osuosl.org/pub/cran/bin/windows/contrib/4.0/Exact_2.1.zip'
Content type 'application/zip' length 107178 bytes (104 KB)
downloaded 104 KB

trying URL 'https://ftp.osuosl.org/pub/cran/bin/windows/contrib/4.0/gld_2.6.2.zip'
Content type 'application/zip' length 251894 bytes (245 KB)
downloaded 245 KB

trying URL 'https://ftp.osuosl.org/pub/cran/bin/windows/contrib/4.0/data.table_1.14.0.zip'
Content type 'application/zip' length 2602848 bytes (2.5 MB)
downloaded 2.5 MB

trying URL 'https://ftp.osuosl.org/pub/cran/bin/windows/contrib/4.0/BH_1.75.0-0.zip'
Content type 'application/zip' length 19674958 bytes (18.8 MB)
downloaded 18.8 MB

trying URL 'https://ftp.osuosl.org/pub/cran/bin/windows/contrib/4.0/DescTools_0.99.41.zip'
Content type 'application/zip' length 5984299 bytes (5.7 MB)
downloaded 5.7 MB

package ‘proxy’ successfully unpacked and MD5 sums checked
package ‘rootSolve’ successfully unpacked and MD5 sums checked
package ‘e1071’ successfully unpacked and MD5 sums checked
package ‘lmom’ successfully unpacked and MD5 sums checked
package ‘mvtnorm’ successfully unpacked and MD5 sums checked
package ‘expm’ successfully unpacked and MD5 sums checked
package ‘Rcpp’ successfully unpacked and MD5 sums checked
package ‘rstudioapi’ successfully unpacked and MD5 sums checked
package ‘Exact’ successfully unpacked and MD5 sums checked
package ‘gld’ successfully unpacked and MD5 sums checked
package ‘data.table’ successfully unpacked and MD5 sums checked


> library(DescTools)
Error in library(DescTools) : there is no package called ‘DescTools’

# Here, we create a system of four dummy codes that allow
# us to implement the ANOVA as a multiple regression problem:

> d1 <- c(rep(1,10),rep(0,40))
> d2 <- c(rep(0,10),rep(1,10),rep(0,30))
> d3 <- c(rep(0,20),rep(1,10),rep(0,20))
> d4 <- c(rep(0,30),rep(1,10),rep(0,10))
> cbind(Score,d1,d2,d3,d4)
      Score d1 d2 d3 d4
 [1,]     9  1  0  0  0
 [2,]     8  1  0  0  0
 [3,]     6  1  0  0  0
 [4,]     8  1  0  0  0
 [5,]    10  1  0  0  0
 [6,]     4  1  0  0  0
 [7,]     6  1  0  0  0
 [8,]     5  1  0  0  0
 [9,]     7  1  0  0  0
[10,]     7  1  0  0  0
[11,]     7  0  1  0  0
[12,]     9  0  1  0  0
[13,]     6  0  1  0  0
[14,]     6  0  1  0  0
[15,]     6  0  1  0  0
[16,]    11  0  1  0  0
[17,]     6  0  1  0  0
[18,]     3  0  1  0  0
[19,]     8  0  1  0  0
[20,]     7  0  1  0  0
[21,]    11  0  0  1  0
[22,]    13  0  0  1  0
[23,]     8  0  0  1  0
[24,]     6  0  0  1  0
[25,]    14  0  0  1  0
[26,]    11  0  0  1  0
[27,]    13  0  0  1  0
[28,]    13  0  0  1  0
[29,]    10  0  0  1  0
[30,]    11  0  0  1  0
[31,]    12  0  0  0  1
[32,]    11  0  0  0  1
[33,]    16  0  0  0  1
[34,]    11  0  0  0  1
[35,]     9  0  0  0  1
[36,]    23  0  0  0  1
[37,]    12  0  0  0  1
[38,]    10  0  0  0  1
[39,]    19  0  0  0  1
[40,]    11  0  0  0  1
[41,]    10  0  0  0  0
[42,]    19  0  0  0  0
[43,]    14  0  0  0  0
[44,]     5  0  0  0  0
[45,]    10  0  0  0  0
[46,]    11  0  0  0  0
[47,]    14  0  0  0  0
[48,]    15  0  0  0  0
[49,]    11  0  0  0  0
[50,]    11  0  0  0  0

# If we look at the coefficient estimates, we see that they
# give back the same group means we've seen before. For example,
# the conditional mean for the Counting group is
# 12 + 1*(-5) + 0*(-5.1) + 0*(-1) + 0*1.4 = 7.0.

> lm(Score~d1+d2+d3+d4)

Call:
lm(formula = Score ~ d1 + d2 + d3 + d4)

Coefficients:
(Intercept)           d1           d2           d3           d4  
       12.0         -5.0         -5.1         -1.0          1.4  

> tapply(Score,Group,mean)
adjective  counting   imagery    intent   rhyming 
     11.0       7.0      13.4      12.0       6.9 

# And here, we get the same ANOVA result as before. (It's important
# to remember that we don't interpret the individual coefficient
# tests in this output, just the overall F of 9.085 that appears at
# the bottom of the output.)

> summary(lm(Score~d1+d2+d3+d4))

Call:
lm(formula = Score ~ d1 + d2 + d3 + d4)

Residuals:
   Min     1Q Median     3Q    Max 
 -7.00  -1.85  -0.45   2.00   9.60 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  12.0000     0.9835  12.201 7.18e-16 ***
d1           -5.0000     1.3909  -3.595 0.000802 ***
d2           -5.1000     1.3909  -3.667 0.000647 ***
d3           -1.0000     1.3909  -0.719 0.475890    
d4            1.4000     1.3909   1.007 0.319544    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.11 on 45 degrees of freedom
Multiple R-squared:  0.4468,    Adjusted R-squared:  0.3976 
F-statistic: 9.085 on 4 and 45 DF,  p-value: 1.815e-05

# We confirm that we have a match to our previous output:

> anova(lm(Score~Group))
Analysis of Variance Table

Response: Score
          Df Sum Sq Mean Sq F value    Pr(>F)    
Group      4 351.52  87.880  9.0848 1.815e-05 ***
Residuals 45 435.30   9.673                      
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 
 
# An "effects coding" system replaces the codes for the last
# group (all zeroes under dummy coding) with all -1's:
 
> e1 <- d1; e2 <- d2; e3 <- d3; e4 <- d4
> e1[41:50] <- -1
> e2[41:50] <- -1
> e3[41:50] <- -1
> e4[41:50] <- -1
> cbind(Score,e1,e2,e3,e4)
      Score e1 e2 e3 e4
 [1,]     9  1  0  0  0
 [2,]     8  1  0  0  0
 [3,]     6  1  0  0  0
 [4,]     8  1  0  0  0
 [5,]    10  1  0  0  0
 [6,]     4  1  0  0  0
 [7,]     6  1  0  0  0
 [8,]     5  1  0  0  0
 [9,]     7  1  0  0  0
[10,]     7  1  0  0  0
[11,]     7  0  1  0  0
[12,]     9  0  1  0  0
[13,]     6  0  1  0  0
[14,]     6  0  1  0  0
[15,]     6  0  1  0  0
[16,]    11  0  1  0  0
[17,]     6  0  1  0  0
[18,]     3  0  1  0  0
[19,]     8  0  1  0  0
[20,]     7  0  1  0  0
[21,]    11  0  0  1  0
[22,]    13  0  0  1  0
[23,]     8  0  0  1  0
[24,]     6  0  0  1  0
[25,]    14  0  0  1  0
[26,]    11  0  0  1  0
[27,]    13  0  0  1  0
[28,]    13  0  0  1  0
[29,]    10  0  0  1  0
[30,]    11  0  0  1  0
[31,]    12  0  0  0  1
[32,]    11  0  0  0  1
[33,]    16  0  0  0  1
[34,]    11  0  0  0  1
[35,]     9  0  0  0  1
[36,]    23  0  0  0  1
[37,]    12  0  0  0  1
[38,]    10  0  0  0  1
[39,]    19  0  0  0  1
[40,]    11  0  0  0  1
[41,]    10 -1 -1 -1 -1
[42,]    19 -1 -1 -1 -1
[43,]    14 -1 -1 -1 -1
[44,]     5 -1 -1 -1 -1
[45,]    10 -1 -1 -1 -1
[46,]    11 -1 -1 -1 -1
[47,]    14 -1 -1 -1 -1
[48,]    15 -1 -1 -1 -1
[49,]    11 -1 -1 -1 -1
[50,]    11 -1 -1 -1 -1

# We still get the same F statistic. But now, if we have
# the same sample size in each group (10 in this case), the
# intercept represents the grand mean of all 50 scores, and
# the slopes are deviations from that grand mean (the "effect"
# of being in the particular group:

> summary(lm(Score~e1+e2+e3+e4))

Call:
lm(formula = Score ~ e1 + e2 + e3 + e4)

Residuals:
   Min     1Q Median     3Q    Max 
 -7.00  -1.85  -0.45   2.00   9.60 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  10.0600     0.4398  22.872  < 2e-16 ***
e1           -3.0600     0.8797  -3.478 0.001131 ** 
e2           -3.1600     0.8797  -3.592 0.000808 ***
e3            0.9400     0.8797   1.069 0.290971    
e4            3.3400     0.8797   3.797 0.000436 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.11 on 45 degrees of freedom
Multiple R-squared:  0.4468,    Adjusted R-squared:  0.3976 
F-statistic: 9.085 on 4 and 45 DF,  p-value: 1.815e-05

> mean(Score)
[1] 10.06

# If we want the effect of the last group, we can get it
# by taking -1 times the sum of the other effects:

> -(-3.06-3.16+0.94+3.34)
[1] 1.94
> 10.06+1.94
[1] 12

# As was the case for the t test, ANY set of number-of-groups-minus-one
# predictors that identifies the groups with unique patterns will get
# us the same ANOVA:

> n1 <- c(rep(pi,10),rep(33,10),rep(4273,10),rep(99,10),rep(3333,10))
> n2 <- c(rep(33,10),rep(4273,10),rep(99,10),rep(pi,10),rep(12,10))
> n3 <- d3
> n4 <- e4

> summary(lm(Score~n1+n2+n3+n4))

Call:
lm(formula = Score ~ n1 + n2 + n3 + n4)

Residuals:
   Min     1Q Median     3Q    Max 
 -7.00  -1.85  -0.45   2.00   9.60 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  6.991e+00  9.929e-01   7.041 8.86e-09 ***
n1           3.327e-03  7.060e-04   4.713 2.38e-05 ***
n2          -4.701e-05  3.252e-04  -0.145 0.885688    
n3          -1.020e+01  2.493e+00  -4.092 0.000175 ***
n4           6.080e+00  1.337e+00   4.546 4.10e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.11 on 45 degrees of freedom
Multiple R-squared:  0.4468,    Adjusted R-squared:  0.3976 
F-statistic: 9.085 on 4 and 45 DF,  p-value: 1.815e-05

>