# We learned about sampling in R, in order to facilitate the process of # selecting your sample for Homework 1. # Here, we use the "sample" function to take 5 random draws from the # integers 1 to 10: > sample( 1:10, 5) [1] 5 10 7 1 8 > sample( 1:10, 5) [1] 6 9 7 1 8 > sample( 1:10, 5) [1] 3 2 5 10 9 > sample( 1:10, 5) [1] 5 3 4 10 7 # Notice that we got a different sample each time, with no duplication. # That's because, by default, R samples without replacement: Once a number # has been drawn, it can't be drawn again. For that reason, if we try to # sample more than 10 draws from the integers 1 to 10, it won't work: > sample( 1:10, 15) Error in sample.int(length(x), size, replace, prob) : cannot take a sample larger than the population when 'replace = FALSE' # We can make R sample WITH replacement like this: > sample( 1:10, 15, replace=TRUE) [1] 5 9 10 6 3 7 3 6 3 5 5 5 2 7 3 # And if we want to be absolutely certain that we are sampling without # replacement, we can specify that: > sample( 1:10, 15, replace=FALSE) Error in sample.int(length(x), size, replace, prob) : cannot take a sample larger than the population when 'replace = FALSE' # Here's my Peabody data set: > Peabody [1] 69 72 94 64 80 77 96 86 89 69 92 71 81 90 84 76 100 57 61 [20] 84 81 65 87 92 89 79 91 65 91 81 86 85 95 93 83 76 84 90 [39] 95 67 # Even though I sample without replacement, I draw two 69s. But that's because # there's more than one 69 in the data set. They're the same value, but not the # same person. > sample(Peabody, 5, replace=FALSE) [1] 83 57 69 84 69 # Here, I create a variable called "choices" to represent 50 individual cases # that I will sample from the Statlab data set. It's very important that this # sampling be WITHOUT replacement. > choices <- sample(1:1296, 50, replace=FALSE) > choices [1] 733 1238 1027 847 646 447 322 236 1234 439 979 819 1044 1051 104 [16] 170 63 1151 634 366 673 372 1070 813 221 423 31 78 344 367 [31] 1282 727 64 581 128 384 588 348 463 1176 732 7 288 310 237 [46] 561 74 799 450 114 > choices <- sort(choices) > choices [1] 7 31 63 64 74 78 104 114 128 170 221 236 237 288 310 [16] 322 344 348 366 367 372 384 423 439 447 450 463 561 581 588 [31] 634 646 673 727 732 733 799 813 819 847 979 1027 1044 1051 1070 [46] 1151 1176 1234 1238 1282 # So my sample will contain the 7th, 31st, 63rd (and so on) case in the data set. # Once again, here's how to read in the Statlab data set. (You can copy and paste # this.) > Statlab <- read.csv("http://faculty.ucmerced.edu/jvevea/classes/202a/data/statlab (abridged).csv") > head(Statlab) CODE CBSEX CBLGTH CBWGT CTHGHT CTWGT CTPEA CTRA MBAG MBWGT MTHGHT MTWGT FBAG 1 1111 0 20.0 6.6 55.7 85 85 34 17 119 66.0 130 19 2 1112 0 20.0 6.4 48.9 59 74 34 17 130 62.8 159 23 3 1113 0 19.8 6.1 54.9 70 64 25 18 134 66.1 138 21 4 1114 0 19.5 7.0 53.6 88 87 43 18 135 61.8 123 26 5 1115 0 19.5 7.9 53.4 68 87 40 18 130 62.8 146 21 6 1116 0 22.0 9.5 59.9 93 83 37 18 104 63.4 116 17 FTHGHT FTWGT FIB FIT 1 70.1 171 33 150 2 65.0 130 40 175 3 70.0 175 44 116 4 71.8 196 42 112 5 68.0 163 50 129 6 74.0 180 0 214 # Now we need to understand a bit more about specifying particular elements # in a data frame. We've already seen that we can use bracket notation to # specify a particular case in a simple variable. Here, for example, we see # that the third Peabody score is 94: > Peabody [1] 69 72 94 64 80 77 96 86 89 69 92 71 81 90 84 76 100 57 61 [20] 84 81 65 87 92 89 79 91 65 91 81 86 85 95 93 83 76 84 90 [39] 95 67 > Peabody[3] [1] 94 # When we have a variable with both rows and columns (called a "data frame"), # we can use similar bracket notation to specify first a row and then a column. # So, for example, the value of the third variable (CBLGTH) for the first child # is 20, as you can see in the "head" of the data set, above, and using bracket # notation here: > Statlab[1,3] [1] 20 # Similarly, the third child's value for CBWGT is 6.1: > Statlab[3,4] [1] 6.1 # If we want the entire row of values for a particular case, we can specify # it like this: > Statlab[1,] CODE CBSEX CBLGTH CBWGT CTHGHT CTWGT CTPEA CTRA MBAG MBWGT MTHGHT MTWGT FBAG 1 1111 0 20 6.6 55.7 85 85 34 17 119 66 130 19 FTHGHT FTWGT FIB FIT 1 70.1 171 33 150 # And we can get multiple rows that way: > Statlab[1:3,] CODE CBSEX CBLGTH CBWGT CTHGHT CTWGT CTPEA CTRA MBAG MBWGT MTHGHT MTWGT FBAG 1 1111 0 20.0 6.6 55.7 85 85 34 17 119 66.0 130 19 2 1112 0 20.0 6.4 48.9 59 74 34 17 130 62.8 159 23 3 1113 0 19.8 6.1 54.9 70 64 25 18 134 66.1 138 21 FTHGHT FTWGT FIB FIT 1 70.1 171 33 150 2 65.0 130 40 175 3 70.0 175 44 116 # Note that we could do the same thing for columns. If I wanted every child's # CBSEX, say, I could do it like this: Statlab[,3]. (I didn't do that in class # because I didn't want to see 1296 values filling up the screen.) # Here are the 50 cases I randomly selected: > choices [1] 7 31 63 64 74 78 104 114 128 170 221 236 237 288 310 [16] 322 344 348 366 367 372 384 423 439 447 450 463 561 581 588 [31] 634 646 673 727 732 733 799 813 819 847 979 1027 1044 1051 1070 [46] 1151 1176 1234 1238 1282 # The first choice was case #7: > Statlab[7,] CODE CBSEX CBLGTH CBWGT CTHGHT CTWGT CTPEA CTRA MBAG MBWGT MTHGHT MTWGT FBAG 7 1121 0 21 7.1 53.1 72 81 33 18 145 65.4 220 23 FTHGHT FTWGT FIB FIT 7 68.1 173 55 142 # The next was case #31: > Statlab[31,] CODE CBSEX CBLGTH CBWGT CTHGHT CTWGT CTPEA CTRA MBAG MBWGT MTHGHT MTWGT FBAG 31 1161 0 21 6.8 52.8 64 94 25 20 140 66.3 147 28 FTHGHT FTWGT FIB FIT 31 71 180 66 120 # If I create a new data frame, selecting the rows corresponding to ALL the # choices, you'll see that the first two rows of the new data set are the # 7th and 31st cases, which we just saw above (and the new data frame continues, # including the 63rd, 64th, 74th cases, and so on). > JackStatlab <- Statlab[choices,] > head(JackStatlab) CODE CBSEX CBLGTH CBWGT CTHGHT CTWGT CTPEA CTRA MBAG MBWGT MTHGHT MTWGT FBAG 7 1121 0 21.0 7.1 53.1 72 81 33 18 145 65.4 220 23 31 1161 0 21.0 6.8 52.8 64 94 25 20 140 66.3 147 28 63 1253 0 19.0 5.6 56.8 80 84 42 21 124 63.8 139 22 64 1254 0 20.5 8.6 57.6 78 74 20 21 134 68.1 143 22 74 1312 0 20.0 6.4 50.5 63 71 21 21 98 62.9 100 22 78 1316 0 20.0 7.6 51.3 58 59 15 21 135 66.0 125 24 FTHGHT FTWGT FIB FIT 7 68.1 173 55 142 31 71.0 180 66 120 63 72.0 170 37 100 64 73.5 185 90 220 74 72.0 160 80 104 78 67.5 169 40 169 # If I were doing the homework assignment, I would want to save this as a CSV # file to submit to Catcourses: > write.csv(JackStatlab, "c:/users/jvevea/Desktop/JackStatlab.csv") # Previously, we've seen that we can avoid cumbersome $ notation (e.g., Statlab$CBSEX) # by "attaching" the data frame: > attach(Statlab) > CBSEX [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [38] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [75] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [112] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [149] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [186] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [223] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [260] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [297] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [334] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [371] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [408] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [445] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [482] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [519] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [556] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [593] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [630] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [667] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [704] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [741] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [778] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [815] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [852] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [889] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [926] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [963] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [1000] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [1037] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [1074] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [1111] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [1148] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [1185] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [1222] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [1259] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [1296] 1 # If I attach another data set that has at least some of the same variable names, # R will alert me to that fact: > attach(JackStatlab) The following objects are masked from Statlab: CBLGTH, CBSEX, CBWGT, CODE, CTHGHT, CTPEA, CTRA, CTWGT, FBAG, FIB, FIT, FTHGHT, FTWGT, MBAG, MBWGT, MTHGHT, MTWGT # Now, CBSEX refers to that variable in the most recently attached data frame: > CBSEX [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 [39] 1 1 1 1 1 1 1 1 1 1 1 1 # But if I detach JackStatlab, the originally attached data frame is still attached: > detach(JackStatlab) > CBSEX [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [38] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [75] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [112] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [149] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [186] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [223] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [260] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [297] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [334] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [371] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [408] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [445] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [482] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [519] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [556] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [593] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [630] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [667] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [704] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [741] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [778] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [815] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [852] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [889] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [926] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [963] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [1000] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [1037] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [1074] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [1111] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [1148] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [1185] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [1222] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [1259] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [1296] 1 # Here's the variable in my sample. This time, because only the larger data frame # is attached, I need to use the $ notation: > JackStatlab$CBSEX [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 [39] 1 1 1 1 1 1 1 1 1 1 1 1 # Often, rather than attaching the data frame, it will be easier to create a new # variable. This has the advantage of allowing us a name that will look better in # graphics labels (without the need to specify new labels using subcommands like # main and xlab): > Sex <- JackStatlab$CBSEX > Sex [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 [39] 1 1 1 1 1 1 1 1 1 1 1 1 # We discussed the behavior of mean and standard deviation under linear transformation. # We've already done an example of a linear transformation when we created a new # family income variable. The original variable is in hundreds of dollars: > head(FIT) [1] 150 175 116 112 129 214 # In a previous session, we transformed that to the more convenient metric of dollars # by multiplying by 100. Note that this is a linear transformation of the form # Y = 0 + 100*X. > head(Income) [1] 15000 17500 11600 11200 12900 21400 # According to the rule for change under linear transformation, the new mean should # be 0 plus 100 times the old mean: > mean(FIT) [1] 155.1944 > 0 + 100*155.1944 [1] 15519.44 > mean(Income) [1] 15519.44 # The new standard deviation should be 100 times the old standard deviation: > sd(FIT) [1] 68.24366 > 100*68.24366 [1] 6824.366 > sd(Income) [1] 6824.366 # The same ideas work for median and interquartile range. (Recall that the lower-case # iqr() function is one that we wrote in a previous class, not a native R function.) > median(FIT) [1] 144 > 100*144 [1] 14400 > median(Income) [1] 14400 > 0 + 100*144 [1] 14400 > iqr(FIT) [1] 78 > iqr(Income) [1] 7800 # As demonstrated in class (see the "whiteboard" link for today), the # Z score is a special case of a linear transformation, and the rules # for mean and standard deviation under linear transformation show that # the Z score will have a mean of 0 and sd of 1. > AllPeabody <- Statlab$CTPEA > head(AllPeabody) [1] 85 74 64 87 87 83 > > mean(AllPeabody) [1] 79.08642 > sd(AllPeabody) [1] 10.56681 # Here, we create Z scores for the 1296 Peabody values: > ZPeabody <- (AllPeabody-mean(AllPeabody))/sd(AllPeabody) # As predicted, they have mean=0 and sd=1: > mean(ZPeabody) [1] 2.110366e-16 > sd(ZPeabody) [1] 1 # This can be a useful intermediate step if we wish to change the # metric of a variable. For example, if we wanted to express our # Peabody scores in a metric more commonly used for intelligence # measures, we could linearly transform the Z scores to have a # mean of 100 and a sd of 15: > IQPeabody <- 100 + 15*ZPeabody > mean(IQPeabody) [1] 100 > sd(IQPeabody) [1] 15 # R does have a function that creates Z scores: > help(scale) starting httpd help server ... done > mean(scale(Peabody)) [1] 2.515349e-16 > sd(scale(Peabody)) [1] 1 # It's important to realize that rules for changes in mean # and standard deviation don't work for nonlinear transformation. # For example, the mean of the square root of income... > RootIncome <- sqrt(Income) > mean(RootIncome) [1] 121.8822 # ...is not the same as the square root of the mean of income: > sqrt(mean(Income)) [1] 124.5771 >