# We had some unintersting variables left over from last time: > ls() [1] "names" "x" "y" > names [1] "Jack" "Kyle" > x [1] 1 2 3 4 5 6 7 8 9 10 > y [1] 1.000000 3.141593 32.000000 1634.000000 # This removes them: > rm(names,x,y) > ls() character(0) # If you ever want to see previous commands you have entered in R, # the history() function can give you that. Here, I ask for the last # 50 commands: > history(50) # We don't see anything in the R console, but a pop-up text window # opened listing the last 50. I pasted a few of them into R to reconstruct # what we had done previously: > x <- 1:10 > x [1] 1 2 3 4 5 6 7 8 9 10 > 2*x [1] 2 4 6 8 10 12 14 16 18 20 > x^2 [1] 1 4 9 16 25 36 49 64 81 100 > y <- c(1, pi, 32, 1634) > y [1] 1.000000 3.141593 32.000000 1634.000000 > sqrt(y) [1] 1.000000 1.772454 5.656854 40.422766 > mean(x) [1] 5.5 > mean(y) [1] 417.5354 > sd(x) [1] 3.02765 > SD(x) Error in SD(x) : could not find function "SD" > x [1] 1 2 3 4 5 6 7 8 9 10 > mode(x) [1] "numeric" > names <- c("Jack","Kyle") > mode(names) [1] "character" > mean(names) [1] NA Warning message: In mean.default(names) : argument is not numeric or logical: returning NA > mean(x) [1] 5.5 > names=="Jack" [1] TRUE FALSE > mode(names=="Jack") [1] "logical" > ls() [1] "names" "x" "y" > setwd("c:/users/jvevea/desktop/Classes/105/2021 Spring") > history(100) > 2 + 2 [1] 4 > (2 + 2)^2 [1] 16 > ls() [1] "names" "x" "y" # (For explanations of any of the pasted commands, see the # annotated R transcript from 1/19/2021.) # Here, I show what happens if you paste commands directly # from one of the annotated transcripts. R gets really confused, # because it sees the embedded command prompts (">") as greater # than symbols, and cannot make sense of the code, so lots of # errors are generated: > # R is a versatile program. At its simplest, it can be > # a simple calculator: > > 2 + 2 Error: unexpected '>' in ">" > [1] 4 Error: unexpected '[' in "[" > > (2 + 2)^2 Error: unexpected '>' in ">" > [1] 16 Error: unexpected '[' in "[" > > # It is possible to create and save variables in R. Here, > # for example, we create a variable called "x" which contains > # the integers from 1 to 10: > > > x <- 1:10 Error: unexpected '>' in ">" > > x Error: unexpected '>' in ">" > [1] 1 2 3 4 5 6 7 8 9 10 Error: unexpected '[' in " [" > > # If we perform an arithmetic operation on such a variable, R > # will apply the operation to every element of the variable: > > > 2*x Error: unexpected '>' in ">" > [1] 2 4 6 8 10 12 14 16 18 20 Error: unexpected '[' in " [" > > x^2 Error: unexpected '>' in ">" > [1] 1 4 9 16 25 36 49 64 81 100 Error: unexpected '[' in " [" # Here, I pasted exactly the same commands using the # "paste commands only" item from the "edit" drop down # menu. This, unfortunately, is not available on Macs, # but on Windows machines, it strips out all of the stuff # that R found confusing: > 2 + 2 [1] 4 > (2 + 2)^2 [1] 16 > x <- 1:10 > x [1] 1 2 3 4 5 6 7 8 9 10 > 2*x [1] 2 4 6 8 10 12 14 16 18 20 > x^2 [1] 1 4 9 16 25 36 49 64 81 100 > ls() [1] "names" "x" "y" > rm(names,x,y) > ls() character(0) # Here's how you can read a csv file from a web page: > read.csv("http://faculty.ucmerced.edu/jvevea/classes/105/data/Peabody.csv") -> Peabody > ls() [1] "Peabody" > Peabody Peabody 1 69 2 72 3 94 4 64 5 80 6 77 7 96 8 86 9 89 10 69 11 92 12 71 13 81 14 90 15 84 16 76 17 100 18 57 19 61 20 84 21 81 22 65 23 87 24 92 25 89 26 79 27 91 28 65 29 91 30 81 31 86 32 85 33 95 34 93 35 83 36 76 37 84 38 90 39 95 40 67 # The "head()" function shows the first few cases of a # variable or data frame. Note that the first line is the # name of the variable: > head(Peabody) Peabody 1 69 2 72 3 94 4 64 5 80 6 77 # Because that name is not a number, R gets confused if we ask it # to do numerical operations: > mean(Peabody) [1] NA Warning message: In mean.default(Peabody) : argument is not numeric or logical: returning NA # We can extract the variable names "Peabody" from the data fram named "Peabody" # using dollars-sign notation: > mean(Peabody$Peabody) [1] 81.675 # We can also do what's called "attaching" the data frame. Usually, this will # let us refer directly to the embedded variable without using the dollars-sign. # Here, however, we get a message about the Peabody object being "masked." That's # because we already have something called Peabody, namely the data frame itself, # and the pre-existing object of that names takes priority. So numerical operations # still won't work: > attach(Peabody) The following object is masked _by_ .GlobalEnv: Peabody > mean(Peabody) [1] NA Warning message: In mean.default(Peabody) : argument is not numeric or logical: returning NA # We can get around that by using a name for the data frame that doesn't # overlap with the name of a variable in the data frame: > PeabodyFrame <- Peabody > rm(Peabody) > detach(Peabody) # Now if we attach the newly named data frame, we can refer directly to # its components without the need for the use of the dollars sign: > attach(PeabodyFrame) > mean(Peabody) [1] 81.675 # The length() function in R tells us how many cases there are for # the variable. Here, we had Peabody scores from 40 kids: > length(Peabody) [1] 40 # We can get a better sense of what the variable is like by sorting # it into ascending sequence: > sort(Peabody) [1] 57 61 64 65 65 67 69 69 71 72 76 76 77 79 80 81 81 81 83 [20] 84 84 84 85 86 86 87 89 89 90 90 91 91 92 92 93 94 95 95 [39] 96 100 # We can also get a simple picture of the distribution using a # stem-and-leaf plot. Values to the left of the vertical bar represent # coarse-grained information; values to the right are fine-graned # information. For example, the first line of the plot tells us that # the smallest Peabody value is in the fifties; the value to the right # of the bar tells us specifically that it's 57. The numbers collectively # end up making a nice picture that shows us which ranges of values are # more or less frequent: > stem(Peabody) The decimal point is 1 digit(s) to the right of the | 5 | 7 6 | 14 6 | 55799 7 | 12 7 | 6679 8 | 01113444 8 | 566799 9 | 00112234 9 | 556 10 | 0 # Here are a couple of common measures of central tendency. The mean # is the arithmetic average: add up all the values and divide by the # number of values: > mean(Peabody) [1] 81.675 # The median is the unique central observation in an ordered data set # if there is an odd number of cases. Here, though, there are 40 Peabody # scores (an even number), so the median is the average of the two centermost # observations, which are both 84: > median(Peabody) [1] 84 # As we have seen before, the mode() function gives us something that has # nothing to do with the central tendency definition of mode (most frequently # occurring value): > mode(Peabody) [1] "numeric" # Producing a table of the values and their counts could help us identify the # mode, but with this type of variable, that's rarely very useful. For example, # here we see that 81 and 83 are the most frequently occurring values: > table(Peabody) Peabody 57 61 64 65 67 69 71 72 76 77 79 80 81 83 84 85 86 87 89 90 1 1 1 2 1 2 1 1 2 1 1 1 3 1 3 1 2 1 2 2 91 92 93 94 95 96 100 2 2 1 1 2 1 1 # But if we look at the grouped data (say, by examining the stem-and-leaf plot, # we can get a very different impression of where the mode or modes are. Generally, # mode is going to be a useful way to represent typical value only for discrete # variables like eye color or gender preference.