Rstudio hw

Daniel Kevins

0 comments

In this exercise you will train a spam classifier using support vector machines. We will use the spam
dataset which comes with the {kernlab} package. First, we will split the spam data randomly into
two halves: one half we will use as the training data, the other half we will use as the test data.
The target variable is type which is a binary class spam and nospam.

0. Look at the help page for the dataset to find out what the different columns mean (hint:
?spam).

1. Fit a support vector classifier using svm() on the training data. type is the target and all
other variables can be used as predictors (hint: you can use the . notation which automatically
includes all columns of the data.frame as predictors except the target variable).

2. Predict spam/nonspam classes for the data in the test dataset. How does the predicted
classification compare with the true classes? What is the classification error?

3. Can you improve the classification accuracy? (Hint: Start by exploring different settings for
the cost attribute and using different predictors.)

4. How easy is it to interpret the classification performed using svm? Compare the interpretability
of the svm model to that of a regression model (e.g., like the one from the exercises above).

About the Author

Daniel Kevins

Follow me