Home
Blog
Data Mining Using R for Data Analysis and Graphics Worksheet

Data Mining Using R for Data Analysis and Graphics Worksheet

Daniel Kevins

0 comments

For the exercises below, use the sonar training and testing data sets. After you read in the data, convert the variable V61 to a factor.

1. In this exercise, you will experiment with bagging. The number of bootstrap samples that bagging uses is a parameter that you can control. The default is nbagg = 25. That is, 25 bootstrap samples will be taken. (And hence, 25 trees will be built.)Fit bagging models with 5, 11, 25, 101, 401, and 1601 bootstrap samples. Make a table or graph that shows the training accuracy and the testing accuracy vs the number of samples. Discuss what you see.

2. In this exercise you will experiment with boosting. The number of iterations that boosting uses is a parameter that you can control. The default is iter = 50.Fit boosting models with 5, 11, 25, 101, 401, and 1601 iterations. Make a table or graph that shows the training accuracy and the testing accuracy vs the number of iterations. Discuss what you see.

3. In this exercise you will experiment with random forests. The size of the forest is a parameter that you can control. The default is ntree = 500. Fit random forests with 5, 11, 25, 101, 401, and 1601 trees. Make a table or graph that shows the training accuracy and the testing accuracy vs the number of trees. Discuss what you see.

4. Compare your experience running the bagging, boosting, and random forest algorithms and their accuracy.

Send the dataset later.

About the Author

Daniel Kevins

Follow me