# We reviewed the idea of added variable plots. 

> attach(BigFive)

# Here, we isolate the parts of Neuroticism (yres) that can't
# be predicted by Depression, as well as the part of Agreeableness
# (xres) that can't be predicted by depression:

> yres <- lm(Neuroticism~Depression)$res
> xres <- lm(Agreeableness~Depression)$res

# The slope for Agreeableness in the multiple regression...

> lm(Neuroticism~Depression+Agreeableness)

Call:
lm(formula = Neuroticism ~ Depression + Agreeableness)

Coefficients:
  (Intercept)     Depression  Agreeableness  
      55.1270         1.0918        -0.4049  

# ...is the same as the slope from the regression of yres on xres:

> lm(yres~xres)

Call:
lm(formula = yres ~ xres)

Coefficients:
(Intercept)         xres  
 -1.055e-16   -4.049e-01  

# The added variable plot gives a graphic representation of that
# relationship. Think of it as being like the raw scatterplot...

> par(pin=c(6,4))
> plot(Agreeableness,Neuroticism)

# ...with systematic variability associated with Depression
# removed from the picture:

> plot(xres,yres)

# Notice that the relationship in the added variable plot (yres vs xres)
# appears a bit cleaner than in the raw scatterplot. That's because
# some of the variation that appears as noise in the raw plot is actually
# systematic variation associated with Depression that has been quite
# literally removed from the picture.


# We can do inference about the slopes in multiple regression using
# exactly the same syntax that we used for simple regression:

> summary(lm(Neuroticism~Depression+Agreeableness))

Call:
lm(formula = Neuroticism ~ Depression + Agreeableness)

Residuals:
    Min      1Q  Median      3Q     Max 
-47.387 -18.687   2.273  17.682  45.497 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)    55.1270    13.9587   3.949 0.000328 ***
Depression      1.0918     0.3040   3.591 0.000930 ***
Agreeableness  -0.4049     0.1744  -2.322 0.025698 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 22.36 on 38 degrees of freedom
Multiple R-squared:  0.4351,    Adjusted R-squared:  0.4053 
F-statistic: 14.63 on 2 and 38 DF,  p-value: 1.941e-05

# We can also check assumptions for multiple regression in
# the same way we did for simple regression. The plot of
# residuals against fitted values is useful for checking
# the assumption that variability of errors is equal across
# the range of fitted values:

> lm(Neuroticism~Depression+Agreeableness) -> reg
> plot(reg$fit, reg$res)

# We can assess the assumption of normally distributed errors
# by applying our usual graphical methods to the residuals:

> stem(reg$res)

  The decimal point is 1 digit(s) to the right of the |

  -4 | 7
  -3 | 100
  -2 | 975431
  -1 | 9732
  -0 | 99310
   0 | 024455
   1 | 1155689
   2 | 0236788
   3 | 0
   4 | 5

> qqnorm(reg$res); qqline(reg$res)

# We might be a bit concerned about the normality assumption, but we
# were far from the decision boundary when we concluded that both slopes
# are nonzero, so we can probably appeal to robustness and not be
# too worried.


# Here's the math practice ANOVA example:

> math <- read.csv("http://faculty.ucmerced.edu/jvevea/classes/202a/data/math.csv")
> head(math)
  Time Practice
1   51   massed
2   96   massed
3   97   massed
4   22   massed
5   35   massed
6   36   massed

# There are massed, spaced, and no practice conditions,
# and the outcome of interest is time to complete a set
# of math problems (faster is better).

# It's simple to calculate the ANOVA in R. The F ratio
# of 5.85 allows us to reject the null hypothesis that
# all three population means are the same.

> attach(math)
> anova(lm(Time~Practice))
Analysis of Variance Table

Response: Time
          Df Sum Sq Mean Sq F value   Pr(>F)   
Practice   2  13772  6885.9  5.8504 0.009559 **
Residuals 21  24717  1177.0                    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 
# Now, let's think about the logic behind ANOVA. ANOVA relies
# on the idea that we can estimate variability in two ways:
# one that is valid regardless of whether the null hypothesis
# is true, and one that is valid if the null is true but otherwise
# will tend to be too large. The first method is analogous to
# the pooled variance estimate in the t test: do a weighted average
# of the variances in all groups. Here, n=8 in all groups, so we
# don't need to weight:

> tapply(Time, Practice, var)
   massed      none    spaced 
 925.8393 1668.2679  936.8571 

# The average of those variance estimates is...

> mean(tapply(Time, Practice, var))
[1] 1176.988

# ...which matches the residual mean square that R gave us.

# The other way of estimating variability looks at how variable
# the sample means are:

> tapply(Time, Practice, mean)
 massed    none  spaced 
 55.125 112.625  94.000 

# According to an algebraic manipulation of the Central Limit
# Theorem, the variability of those means multiplied by the
# sample size on which each mean is based should be a valid
# estimate of variability (but only if the three means are
# actually estimating the same population mean):

> var(tapply(Time, Practice, mean)) * 8
[1] 6885.875

# Note that that result matches the Practice mean square in the
# ANOVA table.