Consider the following matrix:
7 0 0 0 0 0 3 0 0 0 0 0 5 0 0 0 0 0 1 0 0 0 0 0 2
Now consider another matrix:
1 1 9 0 7 3 3 0 0 0 1 0 5 0 10 9 0 0 1 9 0 3 0 0 2
Finally, consider a third matrix:
1 1 9 2 7 3 3 0 6 0 1 0 5 2 10 9 0 0 18 9 0 3 0 0 2
Create a data set consisting of two variables, X and Y, for which the regression of Y on X is Y = 10 + .7 X, and the correlation is modest (somewhere around .3). Include an annotated transcript of your work in R.
Produce a scatterplot of Y against X.
Use either R or SAS to verify that the regression and correlation are approximately what you designed them to be.
Here is a link to a data set consisting of taste ratings of cheddar cheese, along with amounts of three components of cheese that tend to increase as the cheese ages. The taste ratings are in the first column. The second column is amount of acetic acid. The third column is amount of hydrogen sulfide (H2S), the substance responsible for rotten egg odor. The fourth column is amount of lactic acid. (Source: the Data and Story Library, http://lib.stat.cmu.edu/DASL/ )
Use R to produce scatterplots of the relationship of the taste ratings with each of the substance variables. Comment on the linearity of each plot (and hence, the appropriateness of using linear methods).
In SAS, compute a separate regression of taste ratings on each of the three substances. Then compute the multiple regression of taste on all three substances simultaneously. You will notice that the significance of one of the predictors is dramatically different in the multiple regression. Comment on how the significance changes. Specifically, is the predictor that is least significant in the multiple regression the same as the one that is least significant in a simple (one predictor) regression? Discuss any insight you may have into what is going on here. (You may find that an investigation of the intercorrelations of the predictors is relevant.)