# We illustrated the idea of the bootstrap by comparing a parametric
# confidence interval for the mean with a bootstrapped confidence interval.
# I get the parametric interval by using a one-sample t test with a
# completely arbitrary null hypothesis (all I really care about is the
# confidence itnterval):

> attach(JackStatlab)
> t.test(CTPEA, mu=10)

        One Sample t-test

data:  CTPEA
t = 37.472, df = 49, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 10
95 percent confidence interval:
 77.77915 85.46085
sample estimates:
mean of x 
    81.62 

# Now we do a bootstrapped interval:

> MeanOut <- NULL
> for(i in 1:10000) MeanOut[i] <- mean(sample(CTPEA, 50, replace=TRUE))
> hist(MeanOut)
> quantile(MeanOut, .025, type=2)
 2.5% 
78.04 
> quantile(MeanOut, .975, type=2)
97.5% 
 85.5 

# Note that we got more or less the same result. Of course, there's
# no reason to do a bootstrapped interval for the mean, because we
# have good sampling theory for the mean. But consider, for example,
# Pearson's skew index. We know that my sample of Peabody scores
# has the appearance of minor positive skew:

> stem(CTPEA)

  The decimal point is 1 digit(s) to the right of the |

   5 | 9
   6 | 0244568999
   7 | 1222457889
   8 | 000111223455666899
   9 | 01256
  10 | 33468
  11 | 
  12 | 2

# Pearson's index is slightly positive. But is that a surprising
# value if the distribution is actually symmetric?

> pskew(CTPEA)
[1] 0.1376274
 
# We do a bootstrapped estimate of the sampling distribution of
# Pearson's skew index:

> SkewOut <- NULL
> for(i in 1:10000) SkewOut[i] <- pskew(sample(CTPEA, 50, replace=TRUE))
> hist(SkewOut)
> quantile(SkewOut, .025, type=2)
      2.5% 
-0.4255131 
> quantile(SkewOut, .975, type=2)
    97.5% 
0.6691447 
 
# The bounds of the bootstrapped confidence interval run from -0.43 to 0.67,
# so it is entirely reasonable to conclude that there's no convincing evidence
# of positive skew.

# The rest of our R work today was just to facilitate comparison with
# Bayesian results. We calculated the mean Peabody score for girls
# and boys:
 
> tapply(CTPEA, CBSEX, mean)
       0        1 
76.73077 86.91667 

# Also, the standard deviations:

> tapply(CTPEA, CBSEX, sd)
       0        1 
11.64666 13.61558 

# We revisited the t test:

> t.test(CTPEA~CBSEX, var.equal=TRUE)

        Two Sample t-test

data:  CTPEA by CBSEX
t = -2.8494, df = 48, p-value = 0.006434
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -17.373374  -2.998421
sample estimates:
mean in group 0 mean in group 1 
       76.73077        86.91667 

# We saw that a precision of .00000000000000000001 corresponds
# to a really huge standard deviation, confirming that our prior
# distributions were not informative:

> sqrt(1/.00000000000000000001)
[1] 1e+10

# We compared the sample mean for the girls to the mean of
# the Bayesian posterior distribution for the girls' mean:

> mean(CTPEA[1:26])
[1] 76.73077

# We saw that the Bayesian posterior means for standard deviations
# were a little higher than the sample standard deviations. That's
# because it's the MODE of the posterior distribution that's equivalent
# to frequentist estimates when we have uninformative priors. The posterior
# distributions for the standard deviations are a little positively skewed,
# so that the mean of the posterior is a little higher than the mode of
# the posterior.

> tapply(CTPEA, CBSEX, sd)
       0        1 
11.64666 13.61558 

# Finally, we put a Normal(mu=0, precision=100) prior on the slope that
# allows the boys' and girls' means to differ. Here, we confirm that this
# is a HIGHLY informative prior, corresponding a a normal distribution with
# mean=0 and sd=0.1:

> sqrt(1/100)
[1] 0.1
>