Home
Blog
application of Biostat to big data

application of Biostat to big data

Daniel Kevins

0 comments

1• Pick the 10-factor data we know from our Linear Regression HW!
• Use Lecture_GLM_lasso.R to answer the following questions:
– Which Xs did the Lasso find?
– If you take the Lasso-found Xs and fit a linear regression to them, what do you
find?
– If you use linear regression, from scratch, to find your Xs, how do they compare
with the Lasso-found Xs?
– If you use RR alone, from scratch, to find your Xs, how does that compare with
the Lasso-found Xs?
– Which do you think is the “best” model? Why?

2,Let’s pick the dataset: my_data_enet_for_SYSM590.csv, with 100 Xs and 36 observations
• The “true” coefficients in red in the Table below were used to “create” Y. After multiplying the Xs
by their corresponding coefficients, a random error from a Normal distribution with mean 0 and
std. dev. 1500 was added to each product. The intercept was set to 7500. That is the “true” model
is created as follows: Y = 7500 + 2X1 – 0.5X2 + 0.25X3 + … + X100 + random err XD8MvjKuBNIAAAAASUVORK5CYII=

XdZPcTiTJJAAAAABJRU5ErkJggg==

Use Lecture_GLM_enet.Rto find the “best” model

About the Author

Daniel Kevins

Follow me