# We reviewed the dummy coding system we had created for
# the math practice ANOVA:

> MassedDummy
 [1] 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> SpacedDummy
 [1] 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0
> attach(math)

# Here are the ANOVA results letting R choose a way to
# code Practice. Note that R just gives us an overall,
# 2 df test of the whole model:

> anova(lm(Time~Practice))
Analysis of Variance Table

Response: Time
          Df Sum Sq Mean Sq F value   Pr(>F)   
Practice   2  13772  6885.9  5.8504 0.009559 **
Residuals 21  24717  1177.0                    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

# When we do the same thing using our dummy variables,
# R doesn't know that this is an ANOVA as opposed to an
# ordinary regression, so it tries to give us an F test
# for each predictor rather than an overall F test:

> anova(lm(Time~MassedDummy+SpacedDummy))
Analysis of Variance Table

Response: Time
            Df  Sum Sq Mean Sq F value   Pr(>F)   
MassedDummy  1 12384.2 12384.2 10.5219 0.003888 **
SpacedDummy  1  1387.6  1387.6  1.1789 0.289885   
Residuals   21 24716.8  1177.0                    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

# If we apply the summary() wrapper to the model in which R
# chose the parameterization for practice, we DO see separate
# coefficients; but we need to remember that under ANOVA rules,
# we can't interpret the tests, and we don't really know what
# the coefficients mean, because we don't know how R parameterized
# the problem:

> summary(lm(Time~Practice))

Call:
lm(formula = Time ~ Practice)

Residuals:
   Min     1Q Median     3Q    Max 
-70.62 -22.25   4.00  28.34  43.38 

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)       55.12      12.13   4.545 0.000177 ***
Practicenone      57.50      17.15   3.352 0.003019 ** 
Practicespaced    38.88      17.15   2.266 0.034122 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 34.31 on 21 degrees of freedom
Multiple R-squared:  0.3578,    Adjusted R-squared:  0.2967 
F-statistic:  5.85 on 2 and 21 DF,  p-value: 0.009559

# We can get the overall F test from our dummy coded
# version by looking at the bottome of the summary()
# output, but again, we should not interpret the tests
# on the slopes.

> summary(lm(Time~MassedDummy+SpacedDummy))

Call:
lm(formula = Time ~ MassedDummy + SpacedDummy)

Residuals:
   Min     1Q Median     3Q    Max 
-70.62 -22.25   4.00  28.34  43.38 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   112.62      12.13   9.285 6.98e-09 ***
MassedDummy   -57.50      17.15  -3.352  0.00302 ** 
SpacedDummy   -18.62      17.15  -1.086  0.28989    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 34.31 on 21 degrees of freedom
Multiple R-squared:  0.3578,    Adjusted R-squared:  0.2967 
F-statistic:  5.85 on 2 and 21 DF,  p-value: 0.009559

# Previously, we dummy coded Massed and Spaced practice,
# and let the None group be identified by 0, 0 on those
# two variables. We could just as well have coded the None
# group, and ignored one of the other dummy variables:

> NoneDummy <- c(rep(0,16),rep(1,8))
> NoneDummy
 [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1

# We can get the same ANOVA with any two of our three dummy
# variables:

> summary(lm(Time~MassedDummy+NoneDummy))

Call:
lm(formula = Time ~ MassedDummy + NoneDummy)

Residuals:
   Min     1Q Median     3Q    Max 
-70.62 -22.25   4.00  28.34  43.38 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)    94.00      12.13   7.750 1.37e-07 ***
MassedDummy   -38.87      17.15  -2.266   0.0341 *  
NoneDummy      18.62      17.15   1.086   0.2899    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 34.31 on 21 degrees of freedom
Multiple R-squared:  0.3578,    Adjusted R-squared:  0.2967 
F-statistic:  5.85 on 2 and 21 DF,  p-value: 0.009559

# Here's another way to do it. Note that this matches the
# output from the model where we let R do the coding, so now
# we know how R parameterized the problem:

> summary(lm(Time~SpacedDummy+NoneDummy))

Call:
lm(formula = Time ~ SpacedDummy + NoneDummy)

Residuals:
   Min     1Q Median     3Q    Max 
-70.62 -22.25   4.00  28.34  43.38 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)    55.12      12.13   4.545 0.000177 ***
SpacedDummy    38.87      17.15   2.266 0.034122 *  
NoneDummy      57.50      17.15   3.352 0.003019 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 34.31 on 21 degrees of freedom
Multiple R-squared:  0.3578,    Adjusted R-squared:  0.2967 
F-statistic:  5.85 on 2 and 21 DF,  p-value: 0.009559

# Regardless of which coding system we use, we can recover
# the means (with a little rounding error) by calculating the
# conditional means predicted by the regression model:

> tapply(Time, Practice, mean)
 massed    none  spaced 
 55.125 112.625  94.000 
> 55.125 + 38.87
[1] 93.995
> 55.125 + 57.5
[1] 112.625
 
# Another coding system that can be interesting is called
# "effects coding." We can create an effects coding system
# by replacing the 0's with -1's in the group that has zeros
# for the dummy variables (here, the None group):

> e1 <- c(rep(1,8),rep(0,8),rep(-1,8))
> e1
 [1]  1  1  1  1  1  1  1  1  0  0  0  0  0  0  0  0 -1 -1 -1 -1 -1 -1 -1 -1
> e2 <- c(rep(0,8),rep(1,8),rep(-1,8))
> e2
 [1]  0  0  0  0  0  0  0  0  1  1  1  1  1  1  1  1 -1 -1 -1 -1 -1 -1 -1 -1

# If we regress on e1 and e2, we get the same ANOVA, but this
# time, the intercept is the grand mean (i.e., the mean of all
# 24 observations without regard to group membership), and the
# slopes represent the "effect" of being in a particular group:

> summary(lm(Time~e1+e2))

Call:
lm(formula = Time ~ e1 + e2)

Residuals:
   Min     1Q Median     3Q    Max 
-70.62 -22.25   4.00  28.34  43.38 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   87.250      7.003  12.459 3.63e-11 ***
e1           -32.125      9.904  -3.244  0.00389 ** 
e2             6.750      9.904   0.682  0.50296    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 34.31 on 21 degrees of freedom
Multiple R-squared:  0.3578,    Adjusted R-squared:  0.2967 
F-statistic:  5.85 on 2 and 21 DF,  p-value: 0.009559

# As stated above, the intercept is the grand mean:

> mean(Time)
[1] 87.25

# The "effect" of being in the Massed group is -32.125:
> 87.25 -32.125
[1] 55.125

# And the effect of being in the Spaced group is 6.75:

> 87.25 + 6.75
[1] 94

# There isn't a slope that represents the effect of being
# in the None group, but if we wanted to know that, we could
# simply multiply the sum of the other effects by -1:

> -(-32.125+6.75)
[1] 25.375
> 87.25 + 25.375
[1] 112.625

# Just as was true for the t test, ANY coding system that uses
# as many variables as the number of groups minus 1 to identify
# each group by a unique pattern will accomplish the same ANOVA
# (although there is no particular reason we would want to use
# an arbitrary coding like this one):

> nonsense1 <- c(rep(32,8),rep(2389,8),rep(-6,8))
> nonsense2 <- c(rep(pi,8),rep(0,8),rep(11,8))
> nonsense1
 [1]   32   32   32   32   32   32   32   32 2389 2389 2389 2389 2389 2389 2389
[16] 2389   -6   -6   -6   -6   -6   -6   -6   -6
> nonsense2
 [1]  3.141593  3.141593  3.141593  3.141593  3.141593  3.141593  3.141593
 [8]  3.141593  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000
[15]  0.000000  0.000000 11.000000 11.000000 11.000000 11.000000 11.000000
[22] 11.000000 11.000000 11.000000
> summary(lm(Time~nonsense1+nonsense2))

Call:
lm(formula = Time ~ nonesense1 + nonesense2)

Residuals:
   Min     1Q Median     3Q    Max 
-70.62 -22.25   4.00  28.34  43.38 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)   
(Intercept) 30.891327  18.014850   1.715  0.10111   
nonesense1   0.026416   0.009148   2.888  0.00881 **
nonesense2   7.444743   2.214921   3.361  0.00296 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 34.31 on 21 degrees of freedom
Multiple R-squared:  0.3578,    Adjusted R-squared:  0.2967 
F-statistic:  5.85 on 2 and 21 DF,  p-value: 0.009559

# In response to a question from class, I showed a table of the
# dummy and nonsense coding systems so that we could see that
# each group is identified by a unique pattern. The Massed
# group has the values 1, 0 in the dummy coding system; the
# Spaced group has the values 0, 1; and the None group has the
# values 0, 0. Because each group has a unique pattern, the
# regression will result in the ANOVA. For the nonsense coding
# system, Massed is identified by the values 32, pi; Spaced is
# identified by 2389, 0; and None is identified by -6, 11. Again,
# each pattern is unique, so the regression accomplishes the ANOVA.

> cbind(as.character(Practice),MassedDummy,SpacedDummy,nonsense1,nonsense2)
               MassedDummy SpacedDummy nonsense1 nonsense2        
 [1,] "massed" "1"         "0"         "32"       "3.14159265358979"
 [2,] "massed" "1"         "0"         "32"       "3.14159265358979"
 [3,] "massed" "1"         "0"         "32"       "3.14159265358979"
 [4,] "massed" "1"         "0"         "32"       "3.14159265358979"
 [5,] "massed" "1"         "0"         "32"       "3.14159265358979"
 [6,] "massed" "1"         "0"         "32"       "3.14159265358979"
 [7,] "massed" "1"         "0"         "32"       "3.14159265358979"
 [8,] "massed" "1"         "0"         "32"       "3.14159265358979"
 [9,] "spaced" "0"         "1"         "2389"     "0"               
[10,] "spaced" "0"         "1"         "2389"     "0"               
[11,] "spaced" "0"         "1"         "2389"     "0"               
[12,] "spaced" "0"         "1"         "2389"     "0"               
[13,] "spaced" "0"         "1"         "2389"     "0"               
[14,] "spaced" "0"         "1"         "2389"     "0"               
[15,] "spaced" "0"         "1"         "2389"     "0"               
[16,] "spaced" "0"         "1"         "2389"     "0"               
[17,] "none"   "0"         "0"         "-6"       "11"              
[18,] "none"   "0"         "0"         "-6"       "11"              
[19,] "none"   "0"         "0"         "-6"       "11"              
[20,] "none"   "0"         "0"         "-6"       "11"              
[21,] "none"   "0"         "0"         "-6"       "11"              
[22,] "none"   "0"         "0"         "-6"       "11"              
[23,] "none"   "0"         "0"         "-6"       "11"              
[24,] "none"   "0"         "0"         "-6"       "11"              
 

# We calculated two contrasts using the defining formula from
# the PowerPoint (slide three). The first contrast compares
# the combined Massed and Spaced practice groups with the
# None group:

> SSc1 <- (1*mean(Time[1:8]) + 1*mean(Time[9:16]) - 2*mean(Time[17:24]))^2
> SSc1 <- SSc1 / (1^2/8 + 1^2/8 + (-2)^2/8)
> SSc1
[1] 7726.688

# Any contrast has 1 df, so this is already a mean square. We
# can calculate an F statistic to test the contrast by dividing
# by the ANOVA error mean square. (The 1177 here comes from ANOVA
# output that we have already seen in this transcript.)

> Fc1 <- SSc1/1177.0
> Fc1
[1] 6.56473

# The contrast is significant, so we can reject the null hypothesis
# and conclude that practice differs from no practice:

> 1-pf(Fc1, 1, 21)
[1] 0.01815602

# Here's the second contrast, which compares the Massed and Spaced groups:

> SSc2 <- (1*mean(Time[1:8]) + (-1)*mean(Time[9:16]) + 0*mean(Time[17:24]))^2
> SSc2 <- SSc2 / (1^2/8 + 1^2/8 + 0^2/8)
> SSc2
[1] 6045.062
> Fc2 <- SSc2 / 1177
> Fc2
[1] 5.135992

# It, too, is significant:

> 1-pf(Fc2, 1, 21)
[1] 0.03412287
 
# If these contrasts were planned, but not orthogonal, we would need to divide our
# .05 alpha by the number of contrasts, and the second on would not be significant.
# But because these contrasts ARE orthogonal (see slide six for the check of orthogonality),
# and were specified in advance, we can interpret the tests without such an adjustment.


# Here, we created a "contrast coding system" to help us understand the importance
# of orthogonality. The variable c1 has the values 1 for the Massed and Spaced groups
# and -2 for the None group (the same as our first contrast coefficients). The variable
# c2 has the values 1 for Massed, -1 for Spaced, and 0 for None, just like our second
# set of contrast coefficients. Because these two variables identify each group with
# a unique pattern of values (1, 1 for Massed, 1, -1 for Spaced, and -2, 0 for None),
# the regression will produce the same ANOVA.

> c1 <- c(rep(1,16),rep(-2,8))
> c2 <- c(rep(1,8),rep(-1,8),rep(0,8))
> cbind(as.character(Practice),c1,c2)
               c1   c2  
 [1,] "massed" "1"  "1" 
 [2,] "massed" "1"  "1" 
 [3,] "massed" "1"  "1" 
 [4,] "massed" "1"  "1" 
 [5,] "massed" "1"  "1" 
 [6,] "massed" "1"  "1" 
 [7,] "massed" "1"  "1" 
 [8,] "massed" "1"  "1" 
 [9,] "spaced" "1"  "-1"
[10,] "spaced" "1"  "-1"
[11,] "spaced" "1"  "-1"
[12,] "spaced" "1"  "-1"
[13,] "spaced" "1"  "-1"
[14,] "spaced" "1"  "-1"
[15,] "spaced" "1"  "-1"
[16,] "spaced" "1"  "-1"
[17,] "none"   "-2" "0" 
[18,] "none"   "-2" "0" 
[19,] "none"   "-2" "0" 
[20,] "none"   "-2" "0" 
[21,] "none"   "-2" "0" 
[22,] "none"   "-2" "0" 
[23,] "none"   "-2" "0" 
[24,] "none"   "-2" "0" 
> summary(lm(Time~c1+c2))

Call:
lm(formula = Time ~ c1 + c2)

Residuals:
   Min     1Q Median     3Q    Max 
-70.62 -22.25   4.00  28.34  43.38 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   87.250      7.003  12.459 3.63e-11 ***
c1           -12.688      4.952  -2.562   0.0182 *  
c2           -19.438      8.577  -2.266   0.0341 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 34.31 on 21 degrees of freedom
Multiple R-squared:  0.3578,    Adjusted R-squared:  0.2967 
F-statistic:  5.85 on 2 and 21 DF,  p-value: 0.009559

# But this time, the t tests on the slopes have exactly the
# same p values as the F tests for the contrasts that we 
# calculated above. That's because these t tests are just the
# square roots of the F statistics (subject to a bit of
# rounding error here):

> -2.562^2
[1] -6.563844
> Fc1
[1] 6.56473

# When we checked our contrast coefficients for orthogonality, what
# we were actually doing was confiming that the variables in the
# corresponding contrast coding system are uncorrelated:

> cor(c1,c2)
[1] 0

# Remember the artificial data set we used when we first introduced
# multiple regression, where we saw that if the predictors (here, x1 and x2)
# are perfectly uncorrelated, the slopes in simple regresisons were the
# same as the slopes in the multiple regression?

> uncor
     y x1 x2
1   15  1  1
2  -30  1 10
3   30  3  2
4   -5  3  9
5   45  5  3
6   20  5  8
7   60  7  4
8   45  7  7
9   75  9  5
10  70  9  6
> cor(uncor$x1,uncor$x2)
[1] 0

# That was true because the two variables were trying to help
# us understand completely separate parts of the variation in y.
# The same idea holds for orthogonal contrasts: c1 and c2 are
# helping us understand completely different parts of the variation
# in Time, so it's almost as if we had a different sample of people
# for each question.


# If we have a set of orthogonal contrasts, we can make R calculate
# them by forcing it to use a contrast coding parameterization. Here,
# we create a matrix that contains the contrasts we are interested
# in (keeping in mind that R thinks about levels of the factor in
# alphabetical order):

> #                         Massed     None     Spaced
> mycontrasts <- matrix(c(
+                             1,        -2,        1,
+                             1,         0,       -1), 2, 3, byrow=TRUE)
> mycontrasts
     [,1] [,2] [,3]
[1,]    1   -2    1
[2,]    1    0   -1

# If we "transpose" these (i.e., rotate them into the same orientation
# they had when we created c1 and c2), we can use a "contrasts" command
# to force R to use that parameterization for the Practice factor:

> t(mycontrasts)
     [,1] [,2]
[1,]    1    1
[2,]   -2    0
[3,]    1   -1
> contrasts(Practice) <- t(mycontrasts)

# Note that the results for "Practice1" and "Practice2"
# here are the same as the results for c1 and c2 in our
# earlier analysis:

> summary(lm(Time~Practice))

Call:
lm(formula = Time ~ Practice)

Residuals:
   Min     1Q Median     3Q    Max 
-70.62 -22.25   4.00  28.34  43.38 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   87.250      7.003  12.459 3.63e-11 ***
Practice1    -12.688      4.952  -2.562   0.0182 *  
Practice2    -19.438      8.577  -2.266   0.0341 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 34.31 on 21 degrees of freedom
Multiple R-squared:  0.3578,    Adjusted R-squared:  0.2967 
F-statistic:  5.85 on 2 and 21 DF,  p-value: 0.009559

>