# We started by demonstrating the normal density # equation. First, here's a normal curve (mean=50, # sd=10): > x <- seq(20,80,.1); y <- dnorm(x, 50, 10) > plot(x,y,type='l') > abline(h=0) # Here's the equation of the normal density for that # particular normal curve, evaluated at x=50: > 1/sqrt(2*pi*100) * exp(-1/2*(50-50)^2/100) [1] 0.03989423 # We can see that that value exactly matches the peak # of the normal curve: > abline(h=0.03989423) # Another way to calculate it is to use R's dnorm() for the # Z score of the value ((x-mean)/sd) and divide by the standard # deviation. Here, x=50, so the Z score would be (50-50)/10 = 0: > 1/10 * dnorm(0) [1] 0.03989423 # Here, we do the calculation for an x of 30, and we can see # that it matches the height of the curve in the figure where # x = 30: > 1/sqrt(2*pi*100) * exp(-1/2*(30-50)^2/100) [1] 0.005399097 # We introduced the idea of logarithms. If you have encountered # them before, it has probably been logarithms with base 10. # The log base 10 of a number is the power to which 10 has to be # raised to give you that number. So, for example, the log base # 10 of 100 is > log10(100) [1] 2 # because > 10^2 [1] 100 # Similarly, log10 of 1000 is 3 because 10^3 is 1000: > log10(1000) [1] 3 > log10(10000) [1] 4 # It is possible to calculate logs that aren't so obvious. # For example, log10 of 2 is > log10(2) [1] 0.30103 # because > 10^.30103 [1] 2 # In statistics, when we use logs, they are almost always logs # with base e. The base, e, is a transcendantal number (i.e., # one that cannot be expressed as a ratio of real numbers) that # is approximately 2.71828. The exp() function in R raises e # to a power, so we can see what e is by raising e to the # first power: > exp(1) [1] 2.718282 # So, whereas log10 of 100 is 2, the natural log of 10 is > log(100) [1] 4.60517 # because > 2.718282^ 4.60517 [1] 100 # If we consider a function, the log of that function is # what we call "monotonically" related to the function. That # means that it will always be larger when the fucntion itself # is larger and smaller when the function is smaller. Because of # that, if the function has a maximum or minimum, the log of the # function will have a maximum or minimum at the same point. # Here's an example: > x <- c(1:10,10:1) > plot(x,type='l') > lines(log(x)) # Note that the second set of lines goes up and down with the # original lines, but lower. Nevertheless, it reaches its maximum # at the same point. # A student asked if one can take the log of transcendental numbers # such as pi. The answer is Yes: > log(pi) [1] 1.14473 > exp(1.14473) [1] 3.141593 # We digressed to work with a slide rule app to explore the relationship # between the addition of logs and multiplication of the original (unlogged) # numbers. After several easy examples like 2*2=4 on the slide rule, we # tried pi * 1.84, and read off the value 5.78. How well did we do? > pi*1.84 [1] 5.78053 # Wow, pretty good! # Next, we turned to finding the maximum likelihood estimate of mu in a # normal distribution by finding where the log likelihood reaches its peak. # The normal distribution is a particularly easy case to deal with because # even though it has two parameters (mu and sigma-squared), they are independent # in the sampling distribution, so the estimate of one doesn't affect the # estimate of the other. For that reason, we can simplify the log likelihood # for mu by fixing sigma-squared to an arbitrary value, one. That leaves us # with the following very simple log likelihood: > LLmu <- function(x,mu) { + return(sum(-1/2 * (x-mu)^2)) } # We try several different possible values for the mean of # the Peabody distribution: > attach(JackStatlab) > LLmu(CTPEA, 50) [1] -29470.5 > LLmu(CTPEA, 55) [1] -22190.5 > LLmu(CTPEA, 60) [1] -16160.5 > LLmu(CTPEA, 70) [1] -7850.5 > LLmu(CTPEA, 80) [1] -4540.5 > LLmu(CTPEA, 90) [1] -6230.5 > LLmu(CTPEA, 85) [1] -4760.5 # It looks like the value that maximizes the log likelihood # is somewhere near 80, so lets search an area near 80: > x <- seq(70,90,.1) > y <- NULL > for(i in 1:length(x)) y[i] <- LLmu(CTPEA,x[i]) > plot(x,y,type='l') # We look at an even narrower range of possible mu's: > x <- seq(81,83,.01) > for(i in 1:length(x)) y[i] <- LLmu(CTPEA,x[i]) > plot(x,y,type='l') # Here's the sample mean: > mean(CTPEA) [1] 81.62 # We can see that the log likelihood curve reaches its # maximum exactly at the sample mean. So we have learned that # for a normal distribution, the maximum likelihood estimator # for mu is the sample mean: > abline(v=81.62) # The log likelihood function for the variance is a bit more # complicated (because terms that just involve the variance # but not mu don't drop out): > LL <- function(x, mu, var) { + return(sum(-1/2*log(2*pi*var)-1/2*((x-mu)^2 / var))) } # So here's the log likelihood for mu=50, var=1: > LL(CTPEA, 50, 1) [1] -29516.45 # The value of the mean doesn't matter here, so let's just use # the sample mean and try different possible values for the # variance: > LL(CTPEA, mean(CTPEA), 200) [1] -200.7793 > LL(CTPEA, mean(CTPEA), 300) [1] -203.4578 > LL(CTPEA, mean(CTPEA), 100) [1] -205.8251 > LL(CTPEA, mean(CTPEA), 190) [1] -200.6746 # It looks like the function is peaking somewhere near but a bit # below 200, so let's try an appropriate range of possible values # for the variance: > x <- seq(100,200,.1) > for(i in 1:length(x)) y[i] <- LL(CTPEA, mean(CTPEA), x[i]) > plot(x,y,type='l') # The sample mean was the maximum likelihood estimator for mu, so # we might guess that the sample variance is the maximum likelihood # estimator for sigma-squared: > var(CTPEA) [1] 182.6486 # But it actually isn't. Note that the sample variance estimate # is actually a little larger than the point for which the log # likelihood peaks: > abline(v=var(CTPEA)) # Instead, it peaks at the POPULATION variance (i.e., the biased # estimator that is the sum of squared deviations from the mean # divided by N rather than by N-1: > 49*var(CTPEA) / 50 [1] 178.9956 > abline(v=49*var(CTPEA) / 50 ) # From this, we learn that maximum likelihood estimators are # often biased. But that's not necessarily a bad thing. One # good property of maximum likelihood estimators is that they # are "minimum variance estimators," which means that no other # estimator can have a sampling distribution with lower variability. # We can see that by using simulation. Here, we calculate the # sample variance (i.e., using N-1 when we divide) for half a million # samples of N=15 from a normal distribution with a mean of 50 # and a variance of 100: > x <- NULL > for(i in 1:500000) x[i] <- var(rnorm(15, 50, 10)) # Here's the sampling distribution: > hist(x) # And here's the standard deviation of the sampling distribution # (i.e., the standard error of the sample variance): > sd(x) [1] 37.80832 # Here, we convert the sample variance estimates into population # variance estimates (i.e., dividing by N in stead of N-1): > hist( 14*x/15 ) # Note that the standard error of the new sampling distribution # is smaller than the standard error for the sample estimates. # That HAS TO be the case, because we know that maximum likelihood # estimators are "minimum variance" estimators: > sd( 14*x/15 ) [1] 35.28777 # So that's a good property of the maximum likelihood estimator. # But we also know that, whereas the sample variance is unbiased... > mean(x) [1] 100.0083 # ...the maximum likelihood estimator is biased downward: > mean( 14*x/15 ) [1] 93.34108 # That sounds bad. But if we consider the root mean squared error, # which is calculated like the standard deviation of the sampling # distribution, but focusing on deviations from the TRUE VALUE of the # thing being estimated, rather than its mean, the sample variance # tends to be further from the true value... > sqrt( sum((x-100)^2/500000 )) [1] 37.80828 # ...than the maximum likelihood estimate: > xmle <- 14*x/15 > sqrt( sum((xmle-100)^2/500000 )) [1] 35.91052 # So despite being biased, the maximum likelihood estimate tends # to be closer to the truth!