# We had some unintersting variables left over from last time:

> ls()
[1] "names" "x"     "y"    
> names
[1] "Jack" "Kyle"
> x
 [1]  1  2  3  4  5  6  7  8  9 10
> y
[1]    1.000000    3.141593   32.000000 1634.000000

# This removes them:

> rm(names,x,y)
> ls()
character(0)

# If you ever want to see previous commands you have entered in R,
# the history() function can give you that. Here, I ask for the last
# 50 commands:

> history(50)

# We don't see anything in the R console, but a pop-up text window
# opened listing the last 50. I pasted a few of them into R to reconstruct
# what we had done previously:

> x <- 1:10
> x
 [1]  1  2  3  4  5  6  7  8  9 10
> 2*x
 [1]  2  4  6  8 10 12 14 16 18 20
> x^2
 [1]   1   4   9  16  25  36  49  64  81 100
> y <- c(1, pi, 32, 1634)
> y
[1]    1.000000    3.141593   32.000000 1634.000000
> sqrt(y)
[1]  1.000000  1.772454  5.656854 40.422766
> mean(x)
[1] 5.5
> mean(y)
[1] 417.5354
> sd(x)
[1] 3.02765
> SD(x)
Error in SD(x) : could not find function "SD"
> x
 [1]  1  2  3  4  5  6  7  8  9 10
> mode(x)
[1] "numeric"
> names <- c("Jack","Kyle")
> mode(names)
[1] "character"
> mean(names)
[1] NA
Warning message:
In mean.default(names) : argument is not numeric or logical: returning NA
> mean(x)
[1] 5.5
> names=="Jack"
[1]  TRUE FALSE
> mode(names=="Jack")
[1] "logical"
> ls()
[1] "names" "x"     "y"    
> setwd("c:/users/jvevea/desktop/Classes/105/2021 Spring")
> history(100)
> 2 + 2
[1] 4
> (2 + 2)^2
[1] 16
> ls()
[1] "names" "x"     "y"    
 
# (For explanations of any of the pasted commands, see the
# annotated R transcript from 1/19/2021.) 
 
# Here, I show what happens if you paste commands directly
# from one of the annotated transcripts. R gets really confused,
# because it sees the embedded command prompts (">") as greater
# than symbols, and cannot make sense of the code, so lots of
# errors are generated:

> # R is a versatile program. At its simplest, it can be
> # a simple calculator:
> > 2 + 2
Error: unexpected '>' in ">"
> [1] 4
Error: unexpected '[' in "["
> > (2 + 2)^2
Error: unexpected '>' in ">"
> [1] 16
Error: unexpected '[' in "["
> 
> # It is possible to create and save variables in R. Here,
> # for example, we create a variable called "x" which contains
> # the integers from 1 to 10:
> 
> > x <- 1:10
Error: unexpected '>' in ">"
> > x
Error: unexpected '>' in ">"
>  [1]  1  2  3  4  5  6  7  8  9 10
Error: unexpected '[' in " ["
> 
> # If we perform an arithmetic operation on such a variable, R
> # will apply the operation to every element of the variable:
> 
> > 2*x
Error: unexpected '>' in ">"
>  [1]  2  4  6  8 10 12 14 16 18 20
Error: unexpected '[' in " ["
> > x^2
Error: unexpected '>' in ">"
>  [1]   1   4   9  16  25  36  49  64  81 100
Error: unexpected '[' in " ["

# Here, I pasted exactly the same commands using the
# "paste commands only" item from the "edit" drop down
# menu. This, unfortunately, is not available on Macs,
# but on Windows machines, it strips out all of the stuff
# that R found confusing:

> 2 + 2
[1] 4
> (2 + 2)^2
[1] 16
> x <- 1:10
> x
 [1]  1  2  3  4  5  6  7  8  9 10
> 2*x
 [1]  2  4  6  8 10 12 14 16 18 20
> x^2
 [1]   1   4   9  16  25  36  49  64  81 100
> ls()
[1] "names" "x"     "y"    
> rm(names,x,y)
> ls()
character(0)
 

# Here's how you can read a csv file from a web page: 

> read.csv("http://faculty.ucmerced.edu/jvevea/classes/105/data/Peabody.csv") -> Peabody
> ls()
[1] "Peabody"
> Peabody
   Peabody
1       69
2       72
3       94
4       64
5       80
6       77
7       96
8       86
9       89
10      69
11      92
12      71
13      81
14      90
15      84
16      76
17     100
18      57
19      61
20      84
21      81
22      65
23      87
24      92
25      89
26      79
27      91
28      65
29      91
30      81
31      86
32      85
33      95
34      93
35      83
36      76
37      84
38      90
39      95
40      67

# The "head()" function shows the first few cases of a
# variable or data frame. Note that the first line is the
# name of the variable:

> head(Peabody)
  Peabody
1      69
2      72
3      94
4      64
5      80
6      77

# Because that name is not a number, R gets confused if we ask it
# to do numerical operations:

> mean(Peabody)
[1] NA
Warning message:
In mean.default(Peabody) : argument is not numeric or logical: returning NA

# We can extract the variable names "Peabody" from the data fram named "Peabody"
# using dollars-sign notation:

> mean(Peabody$Peabody)
[1] 81.675

# We can also do what's called "attaching" the data frame. Usually, this will
# let us refer directly to the embedded variable without using the dollars-sign.
# Here, however, we get a message about the Peabody object being "masked." That's
# because we already have something called Peabody, namely the data frame itself,
# and the pre-existing object of that names takes priority. So numerical operations
# still won't work:

> attach(Peabody)
The following object is masked _by_ .GlobalEnv:

    Peabody

> mean(Peabody)
[1] NA
Warning message:
In mean.default(Peabody) : argument is not numeric or logical: returning NA

# We can get around that by using a name for the data frame that doesn't
# overlap with the name of a variable in the data frame:

> PeabodyFrame <- Peabody
> rm(Peabody)
> detach(Peabody)

# Now if we attach the newly named data frame, we can refer directly to
# its components without the need for the use of the dollars sign:

> attach(PeabodyFrame)
> mean(Peabody)
[1] 81.675

# The length() function in R tells us how many cases there are for
# the variable. Here, we had Peabody scores from 40 kids:

> length(Peabody)
[1] 40

# We can get a better sense of what the variable is like by sorting
# it into ascending sequence:

> sort(Peabody)
 [1]  57  61  64  65  65  67  69  69  71  72  76  76  77  79  80  81  81  81  83
[20]  84  84  84  85  86  86  87  89  89  90  90  91  91  92  92  93  94  95  95
[39]  96 100

# We can also get a simple picture of the distribution using a
# stem-and-leaf plot. Values to the left of the vertical bar represent
# coarse-grained information; values to the right are fine-graned
# information. For example, the first line of the plot tells us that
# the smallest Peabody value is in the fifties; the value to the right
# of the bar tells us specifically that it's 57. The numbers collectively
# end up making a nice picture that shows us which ranges of values are
# more or less frequent:

> stem(Peabody)

  The decimal point is 1 digit(s) to the right of the |

   5 | 7
   6 | 14
   6 | 55799
   7 | 12
   7 | 6679
   8 | 01113444
   8 | 566799
   9 | 00112234
   9 | 556
  10 | 0

# Here are a couple of common measures of central tendency. The mean
# is the arithmetic average: add up all the values and divide by the
# number of values:

> mean(Peabody)
[1] 81.675

# The median is the unique central observation in an ordered data set
# if there is an odd number of cases. Here, though, there are 40 Peabody
# scores (an even number), so the median is the average of the two centermost
# observations, which are both 84:

> median(Peabody)
[1] 84

# As we have seen before, the mode() function gives us something that has
# nothing to do with the central tendency definition of mode (most frequently
# occurring value):

> mode(Peabody)
[1] "numeric"

# Producing a table of the values and their counts could help us identify the
# mode, but with this type of variable, that's rarely very useful. For example,
# here we see that 81 and 83 are the most frequently occurring values:

> table(Peabody)
Peabody
 57  61  64  65  67  69  71  72  76  77  79  80  81  83  84  85  86  87  89  90 
  1   1   1   2   1   2   1   1   2   1   1   1   3   1   3   1   2   1   2   2 
 91  92  93  94  95  96 100 
  2   2   1   1   2   1   1 
 
# But if we look at the grouped data (say, by examining the stem-and-leaf plot,
# we can get a very different impression of where the mode or modes are. Generally,
# mode is going to be a useful way to represent typical value only for discrete
# variables like eye color or gender preference.