# The confidence interval R provides in two-sample t test
# output is an interval for the difference between means.
# In general, if you can test it with a t test, you can
# construct a confidence iterval by calculating thing of
# interest plus or minus its standard error times the critical
# t. We've seen this before with the mean. Here, it's the
# difference between means. Similar intervals can be constructed
# for regression slopes.

> attach(JackStatlab)
> t.test(CTPEA~CBSEX, var.equal=TRUE)

        Two Sample t-test

data:  CTPEA by CBSEX
t = -0.9583, df = 48, p-value = 0.3427
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -10.929485   3.873929
sample estimates:
mean in group 0 mean in group 1 
       78.75000        82.27778 

> 78.75 - 82.27778
[1] -3.52778
 
# I do an "unimproved" scatterplot of Raven vs Peabody:

> Raven <- CTRA
> Peabody <- CTPEA
> plot(Peabody, Raven)
 
# The covariance, calculated here from its defining formula,
# is a measure of the strength of a linear association:

> sum((Peabody-mean(Peabody))*(Raven-mean(Raven))) / 49
[1] 77.30408

# Here's the easy way to calculate it:

> var(Peabody,Raven)
[1] 77.30408

# The problem with covariance is that its magnitude depends
# on the scale of the variables. For example, if I plot 3*Raven
# instead of Raven, the scatterplot looks identical (except for
# the numbers on the Y axis)...

> plot(Peabody, 3*Raven)

# ...but the covariance is tripled:

> var(Peabody, 3*Raven)
[1] 231.9122
 
# We can eliminate that problem by making the measures scale
# free: divide the covariance by the standard deviation of each
# variable. Note that the result remains the same when we change
# the scale of the Raven:

> var(Peabody,Raven)/sd(Peabody)/sd(Raven)
[1] 0.5844355

> var(Peabody,3*Raven)/sd(Peabody)/sd(3*Raven)
[1] 0.5844355
 
# That is the Pearson product-moment correlation coefficient. 
# Here is the easy way to calculate it:

> cor(Peabody,Raven)
[1] 0.5844355

# Here is a perfect relationship that is not linear:

> x <- seq(-2,2,.2)
> y <- x^2
> plot(x,y)

# The correlation between x and y, though, is zero. Correlation
# is a valid measure of the strength of association only
# if the relationship is linear:

> cor(x,y)
[1] 1.216307e-16

# Here's a rough guess of the slope of a line that fits the
# plot of Raven vs. Peabody. When Peabody is 60, the approximate
# center of the Raven distribution is about 20. When Peabody is
# 120, the center of the Raven distribution is about 120. The ratio
# of the change in Raven over the change in Peabody approximates the
# slope:

> plot(Peabody,Raven)
> (50-20)/(120-60)
[1] 0.5

# It's harder to visualize where the line would cross the Y axis for X=0,
# especially if, as is the case here, 0 falls well beyond the range of X.
# We try several values for the intercept. Finally, -10 seems to give us
# a line that is pretty close to the conditional center of Raven:

> abline(-40, .5)
> abline(-25, .5)
> abline(-15, .5)
> abline(-10, .5)

# Here is the estimate of the slope, working with the formula from
# the Powerpoint. Our approximation of .5 was pretty good:

> sum((Peabody-mean(Peabody))*(Raven-mean(Raven))) /
+ sum((Peabody-mean(Peabody))^2)
[1] 0.4959945

# Let's save that slope:

> sum((Peabody-mean(Peabody))*(Raven-mean(Raven))) /
+ sum((Peabody-mean(Peabody))^2) -> slope

# Now we can use it to calculate the intercept:

> mean(Raven) - slope*mean(Peabody) -> int
> slope
[1] 0.4959945
> int
[1] -7.589479

# So the actual regression line is just a bit higher
# than our final approximate one that assumed an intercept
# of -10:

> abline(int,slope)
 
# Here is the easy way to calculate the regression estimates:

> lm(Raven~Peabody)

Call:
lm(formula = Raven ~ Peabody)

Coefficients:
(Intercept)      Peabody  
     -7.589        0.496  

>