• Home
  • Blog
  • Linear Algebra and Decision Tree Classifier Machine Learning Worksheet

Linear Algebra and Decision Tree Classifier Machine Learning Worksheet

0 comments

Question 1. Decision Tree Classifier

Data: The zip file “hw2.q1.data.zip” contains 3 CSV files:

  • hw2.q1.train.csv” contains 10,000 rows and 26 columns. The first column ‘y’ is the output variable with 2 classes: 0, 1. The remaining 25 columns contain input features: x_1, …, x_25.
  • hw2.q1.test.csv” contains 5,000 rows and 41 columns. The first column ‘y’ is the output variable with 2 classes: 0, 1. The remaining 25 columns contain input features: x_1, …, x_25.
  • hw2.q1.new.csv” contains 30 rows and 26 columns. The first column ‘ID’ is an identifier for 30 unlabeled samples. The remaining 25 columns contain input features: x_1, …, x_25.

Task 1.

Use 5-fold cross-validation with the 10,000 labeled exampled from “hw2.q1.train.csv” to determine the fewest number of rules using which a decision tree classifier can achieve mean cross-validation accuracy of at least 0.96.Report the number of rules needed, the cross-validation accuracy obtained, and all the hyper-parameter values for the DecisionTreeClassifier.

Number of rules needed: ……5………….

Mean cross-validation accuracy: ………………………. (rounded to 4 decimal places)

Hyper-parameter values for selected DecisionTreeClassifier model:

Task 2.

Train a DecisionTreeClassifier with the hyper-parameter values determined in Task 1 on all 10,000 training samples and use it to predict the output class ‘y’ for the 2,000 examples in “hw2.q1.test.csv. Report the following:

  • Accuracy on 2,000 test examples: ……………………(rounded to 4 decimal places)
  • Classification report for the 2,000 test examples:

  • Of the 952 test samples that belong to class y=1, how many are correctly predicted (according to your classification report)?

About the Author

Follow me


{"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}