STATS 500 - Homework 2
Due in class October 1
Part A (Maximum 2 pages). The dataset uswages is drawn as a sample
from the Current Population Survey in 1988.
1. Fit a regression model with weekly wages as the response and years of
education and experience as predictors. Present the output.
2. What percentage of variation in the response is explained by these
predictors? (Percentage variance explained is the same as coefficient
of determination).
3. Which observation has the largest (positive) residual? Give the case
number.
4. Compute the mean and median of the residuals. Explain what the
difference between the mean and the median indicates.
5. For two people with the same education and one year difference in
experience, what would be the difference in predicted weekly wages?
6. Compute the correlation of the residuals with the fitted values. Plot
residuals against fitted values. Explain the value of this correlation
using the geometric (projection) interpretation of least squares.
Hints: Useful R functions: data(), lm(), summary(), residuals(), fitted(),
which.max(), mean(), median(), cor(), plot(). Note that the experience
variable has some negative values which most likely indicate missing data.
Those observations should be removed from the analysis.
1
Part B (Maximum 2 pages). Using R, create a 10 × 3 matrix X:
Now create a 3 × 1 matrix β whose entries are 1, -1, and 2. Next create a
10 × 1 matrix whose entries are IID standard normal (useful command:
“rnorm”). Finally, set Y = Xβ.
1. Calculate (X0X)
−1X0Y to estimate β. What do you get? (Don’t use
the “lm” command. Do the computation directly. You can use the
“solve” command to compute a matrix inverse.)
2. What is the true variance of βˆ? (Remember that the variance of βˆ is
a 3 × 3 matrix.) (I say the “true” variance because, in this example,
we know the true value of σ2, and so don’t need to estimate it using
the residuals.)
3. Use the residuals to estimate σ2. What do you get?
4. Now create a new and re-estimate β. Do this 1,000 times, and save
all the answers in memory. Make a histogram of the 1, 000 values ofβˆ
1. Do the same for βˆ2 and βˆ
3. Also calculate the variance for each of
these. Do your answers match with question 2?
5. Once again, re-create 1,000 times. Each time estimate β, and also
estimate σ2, too. Make a histogram of your 1,000 values of ˆσ2. Based on the histogram, does it look like ˆσ
2 provides a reliable estimate ofσ2? Why do you think this is?
6. Repeat (4) and (5), but instead of using a normal distribution for use
some other distribution that also has expectation 0 and variance 1. Do
your answers change much? Explain. You might want to experiment
with a few different distributions.
因为专业,所以值得信赖。如有需要,请加QQ:99515681 或邮箱:[email protected]
微信:codehelp