# We illustrated the idea of the bootstrap by comparing a parametric # confidence interval for the mean with a bootstrapped confidence interval. # I get the parametric interval by using a one-sample t test with a # completely arbitrary null hypothesis (all I really care about is the # confidence itnterval): > attach(JackStatlab) > t.test(CTPEA, mu=10) One Sample t-test data: CTPEA t = 37.472, df = 49, p-value < 2.2e-16 alternative hypothesis: true mean is not equal to 10 95 percent confidence interval: 77.77915 85.46085 sample estimates: mean of x 81.62 # Now we do a bootstrapped interval: > MeanOut <- NULL > for(i in 1:10000) MeanOut[i] <- mean(sample(CTPEA, 50, replace=TRUE)) > hist(MeanOut) > quantile(MeanOut, .025, type=2) 2.5% 78.04 > quantile(MeanOut, .975, type=2) 97.5% 85.5 # Note that we got more or less the same result. Of course, there's # no reason to do a bootstrapped interval for the mean, because we # have good sampling theory for the mean. But consider, for example, # Pearson's skew index. We know that my sample of Peabody scores # has the appearance of minor positive skew: > stem(CTPEA) The decimal point is 1 digit(s) to the right of the | 5 | 9 6 | 0244568999 7 | 1222457889 8 | 000111223455666899 9 | 01256 10 | 33468 11 | 12 | 2 # Pearson's index is slightly positive. But is that a surprising # value if the distribution is actually symmetric? > pskew(CTPEA) [1] 0.1376274 # We do a bootstrapped estimate of the sampling distribution of # Pearson's skew index: > SkewOut <- NULL > for(i in 1:10000) SkewOut[i] <- pskew(sample(CTPEA, 50, replace=TRUE)) > hist(SkewOut) > quantile(SkewOut, .025, type=2) 2.5% -0.4255131 > quantile(SkewOut, .975, type=2) 97.5% 0.6691447 # The bounds of the bootstrapped confidence interval run from -0.43 to 0.67, # so it is entirely reasonable to conclude that there's no convincing evidence # of positive skew. # The rest of our R work today was just to facilitate comparison with # Bayesian results. We calculated the mean Peabody score for girls # and boys: > tapply(CTPEA, CBSEX, mean) 0 1 76.73077 86.91667 # Also, the standard deviations: > tapply(CTPEA, CBSEX, sd) 0 1 11.64666 13.61558 # We revisited the t test: > t.test(CTPEA~CBSEX, var.equal=TRUE) Two Sample t-test data: CTPEA by CBSEX t = -2.8494, df = 48, p-value = 0.006434 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -17.373374 -2.998421 sample estimates: mean in group 0 mean in group 1 76.73077 86.91667 # We saw that a precision of .00000000000000000001 corresponds # to a really huge standard deviation, confirming that our prior # distributions were not informative: > sqrt(1/.00000000000000000001) [1] 1e+10 # We compared the sample mean for the girls to the mean of # the Bayesian posterior distribution for the girls' mean: > mean(CTPEA[1:26]) [1] 76.73077 # We saw that the Bayesian posterior means for standard deviations # were a little higher than the sample standard deviations. That's # because it's the MODE of the posterior distribution that's equivalent # to frequentist estimates when we have uninformative priors. The posterior # distributions for the standard deviations are a little positively skewed, # so that the mean of the posterior is a little higher than the mode of # the posterior. > tapply(CTPEA, CBSEX, sd) 0 1 11.64666 13.61558 # Finally, we put a Normal(mu=0, precision=100) prior on the slope that # allows the boys' and girls' means to differ. Here, we confirm that this # is a HIGHLY informative prior, corresponding a a normal distribution with # mean=0 and sd=0.1: > sqrt(1/100) [1] 0.1 >