# We started by demonstrating the normal density
# equation. First, here's a normal curve (mean=50,
# sd=10):

> x <- seq(20,80,.1); y <- dnorm(x, 50, 10)
> plot(x,y,type='l')
> abline(h=0)

# Here's the equation of the normal density for that
# particular normal curve, evaluated at x=50:

> 1/sqrt(2*pi*100) * exp(-1/2*(50-50)^2/100)
[1] 0.03989423

# We can see that that value exactly matches the peak
# of the normal curve:

> abline(h=0.03989423)

# Another way to calculate it is to use R's dnorm() for the
# Z score of the value ((x-mean)/sd) and divide by the standard 
# deviation. Here, x=50, so the Z score would be (50-50)/10 = 0:

> 1/10 * dnorm(0)
[1] 0.03989423

# Here, we do the calculation for an x of 30, and we can see
# that it matches the height of the curve in the figure where
# x = 30:

> 1/sqrt(2*pi*100) * exp(-1/2*(30-50)^2/100)
[1] 0.005399097
 
 
# We introduced the idea of logarithms. If you have encountered
# them before, it has probably been logarithms with base 10.
# The log base 10 of a number is the power to which 10 has to be
# raised to give you that number. So, for example, the log base
# 10 of 100 is

> log10(100)
[1] 2

# because

> 10^2
[1] 100

# Similarly, log10 of 1000 is 3 because 10^3 is 1000:

> log10(1000)
[1] 3
> log10(10000)
[1] 4

# It is possible to calculate logs that aren't so obvious.
# For example, log10 of 2 is

> log10(2)
[1] 0.30103

# because

> 10^.30103
[1] 2

# In statistics, when we use logs, they are almost always logs
# with base e. The base, e, is a transcendantal number (i.e.,
# one that cannot be expressed as a ratio of real numbers) that
# is approximately 2.71828. The exp() function in R raises e
# to a power, so we can see what e is by raising e to the
# first power:

> exp(1)
[1] 2.718282

# So, whereas log10 of 100 is 2, the natural log of 10 is

> log(100)
[1] 4.60517

# because

>  2.718282^ 4.60517
[1] 100


# If we consider a function, the log of that function is
# what we call "monotonically" related to the function. That
# means that it will always be larger when the fucntion itself
# is larger and smaller when the function is smaller. Because of
# that, if the function has a maximum or minimum, the log of the
# function will have a maximum or minimum at the same point.
# Here's an example:

> x <- c(1:10,10:1)
> plot(x,type='l')
> lines(log(x))

# Note that the second set of lines goes up and down with the
# original lines, but lower. Nevertheless, it reaches its maximum
# at the same point.

# A student asked if one can take the log of transcendental numbers
# such as pi. The answer is Yes:

> log(pi)
[1] 1.14473
> exp(1.14473)
[1] 3.141593


# We digressed to work with a slide rule app to explore the relationship
# between the addition of logs and multiplication of the original (unlogged)
# numbers. After several easy examples like 2*2=4 on the slide rule, we
# tried pi * 1.84, and read off the value 5.78. How well did we do?

> pi*1.84
[1] 5.78053

# Wow, pretty good!

# Next, we turned to finding the maximum likelihood estimate of mu in a
# normal distribution by finding where the log likelihood reaches its peak.
# The normal distribution is a particularly easy case to deal with because
# even though it has two parameters (mu and sigma-squared), they are independent
# in the sampling distribution, so the estimate of one doesn't affect the
# estimate of the other. For that reason, we can simplify the log likelihood
# for mu by fixing sigma-squared to an arbitrary value, one. That leaves us
# with the following very simple log likelihood:

> LLmu <- function(x,mu) {
+    return(sum(-1/2 * (x-mu)^2)) }

# We try several different possible values for the mean of
# the Peabody distribution:

> attach(JackStatlab)
> LLmu(CTPEA, 50)
[1] -29470.5
> LLmu(CTPEA, 55)
[1] -22190.5
> LLmu(CTPEA, 60)
[1] -16160.5
> LLmu(CTPEA, 70)
[1] -7850.5
> LLmu(CTPEA, 80)
[1] -4540.5
> LLmu(CTPEA, 90)
[1] -6230.5
> LLmu(CTPEA, 85)
[1] -4760.5

# It looks like the value that maximizes the log likelihood
# is somewhere near 80, so lets search an area near 80:

> x <- seq(70,90,.1)
> y <- NULL
> for(i in 1:length(x)) y[i] <- LLmu(CTPEA,x[i])
> plot(x,y,type='l')

# We look at an even narrower range of possible mu's:

> x <- seq(81,83,.01)
> for(i in 1:length(x)) y[i] <- LLmu(CTPEA,x[i])
> plot(x,y,type='l')

# Here's the sample mean:

> mean(CTPEA)
[1] 81.62

# We can see that the log likelihood curve reaches its
# maximum exactly at the sample mean. So we have learned that
# for a normal distribution, the maximum likelihood estimator
# for mu is the sample mean:

> abline(v=81.62)

# The log likelihood function for the variance is a bit more
# complicated (because terms that just involve the variance
# but not mu don't drop out):

> LL <- function(x, mu, var) {
+    return(sum(-1/2*log(2*pi*var)-1/2*((x-mu)^2 / var))) }

# So here's the log likelihood for mu=50, var=1:

> LL(CTPEA, 50, 1)
[1] -29516.45
 
# The value of the mean doesn't matter here, so let's just use
# the sample mean and try different possible values for the
# variance:
 
> LL(CTPEA, mean(CTPEA), 200)
[1] -200.7793
> LL(CTPEA, mean(CTPEA), 300)
[1] -203.4578
> LL(CTPEA, mean(CTPEA), 100)
[1] -205.8251
> LL(CTPEA, mean(CTPEA), 190)
[1] -200.6746

# It looks like the function is peaking somewhere near but a bit
# below 200, so let's try an appropriate range of possible values
# for the variance:

> x <- seq(100,200,.1)
> for(i in 1:length(x)) y[i] <- LL(CTPEA, mean(CTPEA), x[i])
> plot(x,y,type='l')

# The sample mean was the maximum likelihood estimator for mu, so
# we might guess that the sample variance is the maximum likelihood
# estimator for sigma-squared:

> var(CTPEA)
[1] 182.6486

# But it actually isn't. Note that the sample variance estimate
# is actually a little larger than the point for which the log
# likelihood peaks: 

> abline(v=var(CTPEA))

# Instead, it peaks at the POPULATION variance (i.e., the biased 
# estimator that is the sum of squared deviations from the mean
# divided by N rather than by N-1:

> 49*var(CTPEA) / 50
[1] 178.9956
> abline(v=49*var(CTPEA) / 50 )
 
# From this, we learn that maximum likelihood estimators are
# often biased. But that's not necessarily a bad thing. One
# good property of maximum likelihood estimators is that they
# are "minimum variance estimators," which means that no other
# estimator can have a sampling distribution with lower variability.

# We can see that by using simulation. Here, we calculate the
# sample variance (i.e., using N-1 when we divide) for half a million
# samples of N=15 from a normal distribution with a mean of 50
# and a variance of 100:
 
> x <- NULL
> for(i in 1:500000) x[i] <- var(rnorm(15, 50, 10))

# Here's the sampling distribution:

> hist(x)

# And here's the standard deviation of the sampling distribution
# (i.e., the standard error of the sample variance):

> sd(x)
[1] 37.80832

# Here, we convert the sample variance estimates into population
# variance estimates (i.e., dividing by N in stead of N-1):

> hist( 14*x/15 )

# Note that the standard error of the new sampling distribution
# is smaller than the standard error for the sample estimates.
# That HAS TO be the case, because we know that maximum likelihood
# estimators are "minimum variance" estimators:

> sd( 14*x/15 )
[1] 35.28777

# So that's a good property of the maximum likelihood estimator.
# But we also know that, whereas the sample variance is unbiased...

> mean(x)
[1] 100.0083

# ...the maximum likelihood estimator is biased downward:

> mean( 14*x/15 )
[1] 93.34108
 
# That sounds bad. But if we consider the root mean squared error,
# which is calculated like the standard deviation of the sampling
# distribution, but focusing on deviations from the TRUE VALUE of the
# thing being estimated, rather than its mean, the sample variance
# tends to be further from the true value...

> sqrt( sum((x-100)^2/500000 ))
[1] 37.80828

# ...than the maximum likelihood estimate:

> xmle <- 14*x/15
> sqrt( sum((xmle-100)^2/500000 ))
[1] 35.91052
 
# So despite being biased, the maximum likelihood estimate tends
# to be closer to the truth!