# We learned about sampling in R, in order to facilitate the process of
# selecting your sample for Homework 1.

# Here, we use the "sample" function to take 5 random draws from the
# integers 1 to 10:

> sample( 1:10, 5)
[1]  5 10  7  1  8
> sample( 1:10, 5)
[1] 6 9 7 1 8
> sample( 1:10, 5)
[1]  3  2  5 10  9
> sample( 1:10, 5)
[1]  5  3  4 10  7

# Notice that we got a different sample each time, with no duplication.
# That's because, by default, R samples without replacement: Once a number
# has been drawn, it can't be drawn again. For that reason, if we try to
# sample more than 10 draws from the integers 1 to 10, it won't work:

> sample( 1:10, 15)
Error in sample.int(length(x), size, replace, prob) : 
  cannot take a sample larger than the population when 'replace = FALSE'

# We can make R sample WITH replacement like this:

> sample( 1:10, 15, replace=TRUE)
 [1]  5  9 10  6  3  7  3  6  3  5  5  5  2  7  3

# And if we want to be absolutely certain that we are sampling without
# replacement, we can specify that:

> sample( 1:10, 15, replace=FALSE)
Error in sample.int(length(x), size, replace, prob) : 
  cannot take a sample larger than the population when 'replace = FALSE'

# Here's my Peabody data set:

> Peabody
 [1]  69  72  94  64  80  77  96  86  89  69  92  71  81  90  84  76 100  57  61
[20]  84  81  65  87  92  89  79  91  65  91  81  86  85  95  93  83  76  84  90
[39]  95  67
 
# Even though I sample without replacement, I draw two 69s. But that's because 
# there's more than one 69 in the data set. They're the same value, but not the
# same person.

> sample(Peabody, 5, replace=FALSE)
[1] 83 57 69 84 69
 
# Here, I create a variable called "choices" to represent 50 individual cases
# that I will sample from the Statlab data set. It's very important that this
# sampling be WITHOUT replacement.

> choices <- sample(1:1296, 50, replace=FALSE)
> choices
 [1]  733 1238 1027  847  646  447  322  236 1234  439  979  819 1044 1051  104
[16]  170   63 1151  634  366  673  372 1070  813  221  423   31   78  344  367
[31] 1282  727   64  581  128  384  588  348  463 1176  732    7  288  310  237
[46]  561   74  799  450  114
> choices <- sort(choices)
> choices
 [1]    7   31   63   64   74   78  104  114  128  170  221  236  237  288  310
[16]  322  344  348  366  367  372  384  423  439  447  450  463  561  581  588
[31]  634  646  673  727  732  733  799  813  819  847  979 1027 1044 1051 1070
[46] 1151 1176 1234 1238 1282
 
# So my sample will contain the 7th, 31st, 63rd (and so on) case in the data set.

# Once again, here's how to read in the Statlab data set. (You can copy and paste
# this.)

> Statlab <- read.csv("http://faculty.ucmerced.edu/jvevea/classes/202a/data/statlab (abridged).csv")
> head(Statlab)
  CODE CBSEX CBLGTH CBWGT CTHGHT CTWGT CTPEA CTRA MBAG MBWGT MTHGHT MTWGT FBAG
1 1111     0   20.0   6.6   55.7    85    85   34   17   119   66.0   130   19
2 1112     0   20.0   6.4   48.9    59    74   34   17   130   62.8   159   23
3 1113     0   19.8   6.1   54.9    70    64   25   18   134   66.1   138   21
4 1114     0   19.5   7.0   53.6    88    87   43   18   135   61.8   123   26
5 1115     0   19.5   7.9   53.4    68    87   40   18   130   62.8   146   21
6 1116     0   22.0   9.5   59.9    93    83   37   18   104   63.4   116   17
  FTHGHT FTWGT FIB FIT
1   70.1   171  33 150
2   65.0   130  40 175
3   70.0   175  44 116
4   71.8   196  42 112
5   68.0   163  50 129
6   74.0   180   0 214

# Now we need to understand a bit more about specifying particular elements
# in a data frame. We've already seen that we can use bracket notation to
# specify a particular case in a simple variable. Here, for example, we see
# that the third Peabody score is 94:

> Peabody
 [1]  69  72  94  64  80  77  96  86  89  69  92  71  81  90  84  76 100  57  61
[20]  84  81  65  87  92  89  79  91  65  91  81  86  85  95  93  83  76  84  90
[39]  95  67
> Peabody[3]
[1] 94

# When we have a variable with both rows and columns (called a "data frame"),
# we can use similar bracket notation to specify first a row and then a column.
# So, for example, the value of the third variable (CBLGTH) for the first child
# is 20, as you can see in the "head" of the data set, above, and using bracket
# notation here:

> Statlab[1,3]
[1] 20

# Similarly, the third child's value for CBWGT is 6.1:

> Statlab[3,4]
[1] 6.1

# If we want the entire row of values for a particular case, we can specify
# it like this:

> Statlab[1,]
  CODE CBSEX CBLGTH CBWGT CTHGHT CTWGT CTPEA CTRA MBAG MBWGT MTHGHT MTWGT FBAG
1 1111     0     20   6.6   55.7    85    85   34   17   119     66   130   19
  FTHGHT FTWGT FIB FIT
1   70.1   171  33 150

# And we can get multiple rows that way:

> Statlab[1:3,]
  CODE CBSEX CBLGTH CBWGT CTHGHT CTWGT CTPEA CTRA MBAG MBWGT MTHGHT MTWGT FBAG
1 1111     0   20.0   6.6   55.7    85    85   34   17   119   66.0   130   19
2 1112     0   20.0   6.4   48.9    59    74   34   17   130   62.8   159   23
3 1113     0   19.8   6.1   54.9    70    64   25   18   134   66.1   138   21
  FTHGHT FTWGT FIB FIT
1   70.1   171  33 150
2   65.0   130  40 175
3   70.0   175  44 116

# Note that we could do the same thing for columns. If I wanted every child's
# CBSEX, say, I could do it like this:  Statlab[,3]. (I didn't do that in class
# because I didn't want to see 1296 values filling up the screen.)

# Here are the 50 cases I randomly selected:

> choices
 [1]    7   31   63   64   74   78  104  114  128  170  221  236  237  288  310
[16]  322  344  348  366  367  372  384  423  439  447  450  463  561  581  588
[31]  634  646  673  727  732  733  799  813  819  847  979 1027 1044 1051 1070
[46] 1151 1176 1234 1238 1282

# The first choice was case #7:

> Statlab[7,]
  CODE CBSEX CBLGTH CBWGT CTHGHT CTWGT CTPEA CTRA MBAG MBWGT MTHGHT MTWGT FBAG
7 1121     0     21   7.1   53.1    72    81   33   18   145   65.4   220   23
  FTHGHT FTWGT FIB FIT
7   68.1   173  55 142

# The next was case #31:
> Statlab[31,]
   CODE CBSEX CBLGTH CBWGT CTHGHT CTWGT CTPEA CTRA MBAG MBWGT MTHGHT MTWGT FBAG
31 1161     0     21   6.8   52.8    64    94   25   20   140   66.3   147   28
   FTHGHT FTWGT FIB FIT
31     71   180  66 120

# If I create a new data frame, selecting the rows corresponding to ALL the
# choices, you'll see that the first two rows of the new data set are the
# 7th and 31st cases, which we just saw above (and the new data frame continues,
# including the 63rd, 64th, 74th cases, and so on).

> JackStatlab <- Statlab[choices,]
> head(JackStatlab)
   CODE CBSEX CBLGTH CBWGT CTHGHT CTWGT CTPEA CTRA MBAG MBWGT MTHGHT MTWGT FBAG
7  1121     0   21.0   7.1   53.1    72    81   33   18   145   65.4   220   23
31 1161     0   21.0   6.8   52.8    64    94   25   20   140   66.3   147   28
63 1253     0   19.0   5.6   56.8    80    84   42   21   124   63.8   139   22
64 1254     0   20.5   8.6   57.6    78    74   20   21   134   68.1   143   22
74 1312     0   20.0   6.4   50.5    63    71   21   21    98   62.9   100   22
78 1316     0   20.0   7.6   51.3    58    59   15   21   135   66.0   125   24
   FTHGHT FTWGT FIB FIT
7    68.1   173  55 142
31   71.0   180  66 120
63   72.0   170  37 100
64   73.5   185  90 220
74   72.0   160  80 104
78   67.5   169  40 169

# If I were doing the homework assignment, I would want to save this as a CSV
# file to submit to Catcourses:

> write.csv(JackStatlab, "c:/users/jvevea/Desktop/JackStatlab.csv")
 
# Previously, we've seen that we can avoid cumbersome $ notation (e.g., Statlab$CBSEX)
# by "attaching" the data frame:

> attach(Statlab)
> CBSEX
   [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  [38] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  [75] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [112] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [149] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [186] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [223] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [260] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [297] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [334] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [371] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [408] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [445] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [482] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [519] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [556] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [593] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [630] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [667] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [704] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [741] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [778] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [815] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [852] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [889] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [926] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [963] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[1000] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[1037] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[1074] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[1111] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[1148] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[1185] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[1222] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[1259] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[1296] 1

# If I attach another data set that has at least some of the same variable names,
# R will alert me to that fact:

> attach(JackStatlab)
The following objects are masked from Statlab:

    CBLGTH, CBSEX, CBWGT, CODE, CTHGHT, CTPEA, CTRA, CTWGT, FBAG, FIB,
    FIT, FTHGHT, FTWGT, MBAG, MBWGT, MTHGHT, MTWGT

# Now, CBSEX refers to that variable in the most recently attached data frame:

> CBSEX
 [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1
[39] 1 1 1 1 1 1 1 1 1 1 1 1

# But if I detach JackStatlab, the originally attached data frame is still attached:

> detach(JackStatlab)
> CBSEX
   [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  [38] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  [75] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [112] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [149] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [186] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [223] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [260] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [297] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [334] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [371] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [408] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [445] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [482] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [519] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [556] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [593] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [630] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [667] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [704] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [741] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [778] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [815] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [852] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [889] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [926] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [963] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[1000] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[1037] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[1074] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[1111] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[1148] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[1185] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[1222] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[1259] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[1296] 1
 
# Here's the variable in my sample. This time, because only the larger data frame
# is attached, I need to use the $ notation:

> JackStatlab$CBSEX
 [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1
[39] 1 1 1 1 1 1 1 1 1 1 1 1

# Often, rather than attaching the data frame, it will be easier to create a new
# variable. This has the advantage of allowing us a name that will look better in
# graphics labels (without the need to specify new labels using subcommands like
# main and xlab):

> Sex <- JackStatlab$CBSEX
> Sex
 [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1
[39] 1 1 1 1 1 1 1 1 1 1 1 1

# We discussed the behavior of mean and standard deviation under linear transformation.
# We've already done an example of a linear transformation when we created a new
# family income variable. The original variable is in hundreds of dollars:

> head(FIT)
[1] 150 175 116 112 129 214

# In a previous session, we transformed that to the more convenient metric of dollars
# by multiplying by 100. Note that this is a linear transformation of the form
# Y = 0 + 100*X.

> head(Income)
[1] 15000 17500 11600 11200 12900 21400

# According to the rule for change under linear transformation, the new mean should
# be 0 plus 100 times the old mean:

> mean(FIT)
[1] 155.1944
> 0 + 100*155.1944
[1] 15519.44
> mean(Income)
[1] 15519.44

# The new standard deviation should be 100 times the old standard deviation:

> sd(FIT)
[1] 68.24366
> 100*68.24366
[1] 6824.366
> sd(Income)
[1] 6824.366

# The same ideas work for median and interquartile range. (Recall that the lower-case
# iqr() function is one that we wrote in a previous class, not a native R function.)

> median(FIT)
[1] 144
> 100*144
[1] 14400
> median(Income)
[1] 14400
> 0 + 100*144
[1] 14400
> iqr(FIT)
[1] 78
> iqr(Income)
[1] 7800
 
# As demonstrated in class (see the "whiteboard" link for today), the
# Z score is a special case of a linear transformation, and the rules
# for mean and standard deviation under linear transformation show that
# the Z score will have a mean of 0 and sd of 1.
 
> AllPeabody <- Statlab$CTPEA
> head(AllPeabody)
[1] 85 74 64 87 87 83
> 
> mean(AllPeabody)
[1] 79.08642
> sd(AllPeabody)
[1] 10.56681
 
# Here, we create Z scores for the 1296 Peabody values:

> ZPeabody <- (AllPeabody-mean(AllPeabody))/sd(AllPeabody)

# As predicted, they have mean=0 and sd=1:

> mean(ZPeabody)
[1] 2.110366e-16
> sd(ZPeabody)
[1] 1

# This can be a useful intermediate step if we wish to change the
# metric of a variable. For example, if we wanted to express our
# Peabody scores in a metric more commonly used for intelligence
# measures, we could linearly transform the Z scores to have a
# mean of 100 and a sd of 15:
 
> IQPeabody <- 100 + 15*ZPeabody
> mean(IQPeabody)
[1] 100
> sd(IQPeabody)
[1] 15
 
# R does have a function that creates Z scores:

> help(scale)
starting httpd help server ... done
> mean(scale(Peabody))
[1] 2.515349e-16
> sd(scale(Peabody))
[1] 1
 
# It's important to realize that rules for changes in mean
# and standard deviation don't work for nonlinear transformation.
# For example, the mean of the square root of income...

> RootIncome <- sqrt(Income)
> mean(RootIncome)
[1] 121.8822

# ...is not the same as the square root of the mean of income:

> sqrt(mean(Income))
[1] 124.5771
>