Questions
- (30 points) Assume the following model: Yi=1.25?0.2Xi+?iYi=1.25?0.2Xi+?i where ?i?i are i.i.d. with N(0,1)N(0,1)
- Find: E[Y|X=0]E[Y|X=0], E[Y|X=1]E[Y|X=1], and Var[Y|X]Var[Y|X].
- What is the probability of Y>1Y>1, given X=?2X=?2?
- If E[Xi]=0E[Xi]=0 and Var(Xi)=2Var(Xi)=2, what are E[Y]E[Y] and Var(Y)Var(Y)?
- (40 points) Using seed of 123, and
rnorm function in R, generate a sample of size n=1000n=1000 of Xi?N(0,?2X)Xi?N(0,?X2) where ?2X=4?X2=4, and generate Yi=1.25?0.2Xi?iYi=1.25?0.2Xi?i where ?i?N(0,?2?)?i?N(0,??2) and ?2?=1??2=1
- Plot the scatter plot of XX and YY.
- Split the sample into 2 subsets of size 250 and 750. For each subset, run the regression of YY on XX using
lm. Add each fitted regression line (use color) to your plot from (a). Are the lines the same? Explain why/why not
- Repeat the steps for different values of nn, ?2X?X2, and ?2???2. Do you get similar results? What changes and why?
- (30 points) For this exercise, we use the
R data set Wage from ISLR package. Try ?Wage for more information on the data set. For this question, our YY variable is wage, and XX is age.
head(Wage)
## year age maritl race education region
## 231655 2006 18 1. Never Married 1. White 1. < HS Grad 2. Middle Atlantic
## 86582 2004 24 1. Never Married 1. White 4. College Grad 2. Middle Atlantic
## 161300 2003 45 2. Married 1. White 3. Some College 2. Middle Atlantic
## 155159 2003 43 2. Married 3. Asian 4. College Grad 2. Middle Atlantic
## 11443 2005 50 4. Divorced 1. White 2. HS Grad 2. Middle Atlantic
## 376662 2008 54 2. Married 1. White 4. College Grad 2. Middle Atlantic
## jobclass health health_ins logwage wage
## 231655 1. Industrial 1. <=Good 2. No 4.318063 75.04315
## 86582 2. Information 2. >=Very Good 2. No 4.255273 70.47602
## 161300 1. Industrial 1. <=Good 1. Yes 4.875061 130.98218
## 155159 2. Information 2. >=Very Good 1. Yes 5.041393 154.68529
## 11443 2. Information 1. <=Good 1. Yes 4.318063 75.04315
## 376662 2. Information 2. >=Very Good 1. Yes 4.845098 127.11574
- Run the regression wagei=?0+?1agei+?iwagei=?0+?1agei+?i using the
lm function. Are the slopes significant?
- Now, estimate the slope and intercept without using the
lm function, as we did in class. Do you get the same results?
- Suppose you were interested in the wage of someone who is 40 years old, what would you expect their wages to be? What is the 95% predictive (wage) interval for the wage? Compare the endpoints of the interval to the observed values of wage. What do you conclude about your prediction from this? Why or why not is this conclusion surprising.
0 comments