# The confidence interval R provides in two-sample t test # output is an interval for the difference between means. # In general, if you can test it with a t test, you can # construct a confidence iterval by calculating thing of # interest plus or minus its standard error times the critical # t. We've seen this before with the mean. Here, it's the # difference between means. Similar intervals can be constructed # for regression slopes. > attach(JackStatlab) > t.test(CTPEA~CBSEX, var.equal=TRUE) Two Sample t-test data: CTPEA by CBSEX t = -0.9583, df = 48, p-value = 0.3427 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -10.929485 3.873929 sample estimates: mean in group 0 mean in group 1 78.75000 82.27778 > 78.75 - 82.27778 [1] -3.52778 # I do an "unimproved" scatterplot of Raven vs Peabody: > Raven <- CTRA > Peabody <- CTPEA > plot(Peabody, Raven) # The covariance, calculated here from its defining formula, # is a measure of the strength of a linear association: > sum((Peabody-mean(Peabody))*(Raven-mean(Raven))) / 49 [1] 77.30408 # Here's the easy way to calculate it: > var(Peabody,Raven) [1] 77.30408 # The problem with covariance is that its magnitude depends # on the scale of the variables. For example, if I plot 3*Raven # instead of Raven, the scatterplot looks identical (except for # the numbers on the Y axis)... > plot(Peabody, 3*Raven) # ...but the covariance is tripled: > var(Peabody, 3*Raven) [1] 231.9122 # We can eliminate that problem by making the measures scale # free: divide the covariance by the standard deviation of each # variable. Note that the result remains the same when we change # the scale of the Raven: > var(Peabody,Raven)/sd(Peabody)/sd(Raven) [1] 0.5844355 > var(Peabody,3*Raven)/sd(Peabody)/sd(3*Raven) [1] 0.5844355 # That is the Pearson product-moment correlation coefficient. # Here is the easy way to calculate it: > cor(Peabody,Raven) [1] 0.5844355 # Here is a perfect relationship that is not linear: > x <- seq(-2,2,.2) > y <- x^2 > plot(x,y) # The correlation between x and y, though, is zero. Correlation # is a valid measure of the strength of association only # if the relationship is linear: > cor(x,y) [1] 1.216307e-16 # Here's a rough guess of the slope of a line that fits the # plot of Raven vs. Peabody. When Peabody is 60, the approximate # center of the Raven distribution is about 20. When Peabody is # 120, the center of the Raven distribution is about 120. The ratio # of the change in Raven over the change in Peabody approximates the # slope: > plot(Peabody,Raven) > (50-20)/(120-60) [1] 0.5 # It's harder to visualize where the line would cross the Y axis for X=0, # especially if, as is the case here, 0 falls well beyond the range of X. # We try several values for the intercept. Finally, -10 seems to give us # a line that is pretty close to the conditional center of Raven: > abline(-40, .5) > abline(-25, .5) > abline(-15, .5) > abline(-10, .5) # Here is the estimate of the slope, working with the formula from # the Powerpoint. Our approximation of .5 was pretty good: > sum((Peabody-mean(Peabody))*(Raven-mean(Raven))) / + sum((Peabody-mean(Peabody))^2) [1] 0.4959945 # Let's save that slope: > sum((Peabody-mean(Peabody))*(Raven-mean(Raven))) / + sum((Peabody-mean(Peabody))^2) -> slope # Now we can use it to calculate the intercept: > mean(Raven) - slope*mean(Peabody) -> int > slope [1] 0.4959945 > int [1] -7.589479 # So the actual regression line is just a bit higher # than our final approximate one that assumed an intercept # of -10: > abline(int,slope) # Here is the easy way to calculate the regression estimates: > lm(Raven~Peabody) Call: lm(formula = Raven ~ Peabody) Coefficients: (Intercept) Peabody -7.589 0.496 >