Overview and Rationale
In order to consolidate your
theoretical knowledge into technique and skills with practical and
applicational value, you will use the glmnet() package in R to implement
Ridge and LASSO functions to build linear and logistic models through
Ridge and LASSO regression over values of the regularization parameter
lambda.
Course Outcomes
This assignment is directly linked to the following key learning outcomes from the course syllabus:
- Conduct regularization method for models to describe relationships among variables and make useful predictions.
Assignment Summary
Use the College dataset (https://rdrr.io/cran/ISLR/man/College.html)
from the ISLR library to build regularization models by using Ridge and
Lasso (least absolute shrinkage and selection operator). Predict
Grad.Rate for all models.
- Split the data into a train and
test set – refer to the Feature_Selection_R.pdf document for information
on how to split a dataset.
Ridge Regression
- Use the cv.glmnet function to estimate the lambda.min and lambda.1se values. Compare and discuss the values.
- Plot the results from the glmnet function provide an interpretation. What does this plot tell us?
- Fit a Ridge regression model against the training set and report on the coefficients. Is there anything interesting?
- Determine
the performance of the fit model against the training set by
calculating the root mean square error (RMSE). sqrt(mean((actual –
predicted)^2)). - Determine the performance of the fit model
against the test set by calculating the root mean square error (RMSE).
Is your model overfit?
LASSO
- Use the cv.glmnet function to estimate the lambda.min and lambda.1se values. Compare and discuss the values.
- Plot the results from the glmnet function provide an interpretation. What does this plot tell us?
- Fit
a LASSO regression model against the training set and report on the
coefficients. Do any coefficients reduce to zero? If so, which ones? - Determine
the performance of the fit model against the training set by
calculating the root mean square error (RMSE). sqrt(mean((actual –
predicted)^2)). - Determine the performance of the fit model
against the test set by calculating the root mean square error (RMSE).
Is your model overfit?
Comparison
- Which model performed better and why? Is that what you expected?
- Refer
to the Intermediate_Analytics_Feature_Selection_R.pdf document for how
to perform stepwise selection and then fit a model. Did this model
perform better or as well as Ridge regression or LASSO? Which method do
you prefer and why?
Report
Refer to
the attached rubric for more details on the report. The report should
contain a well written cover/title page, introduction, body, conclusion,
and references. It must follow APA format and have at least 1000 words
(excluding title page and references page. All R code used for your
report should be included in an appendix at the end of the report.
Graphs,
figures, charts, and tables are very useful visual effects to
communicate your results and impress your readers. However, such items
should not be included in the report unless they are well described and
interpreted. Please use subtitles to make your assignment more reader
friendly as well.
Format & Guidelines
The report should follow the following format:
- Title page
- Introduction
- Analysis
- Conclusion/Interpretations
- References


0 comments