Problem 1 (15 points). Suppose the covariance matrix corresponding to
standardized variables 1, 2 is
Σ= [1 1]
Determine the principal components of the variables and the variables 1, 2 after
performing PCA on 1, 2.
Problem 2 (28 points). Consider the training examples shown in the following
table for a binary classification problem.
a) Compute the Gini index for the overall collection of training examples.
b) Compute the Gini index for the Customer ID attribute.
c) Compute the Gini index for the Gender attribute.
d) Compute the Gini index for the Car Type attribute using multiway split.
e) Compute the Gini index for the Shirt Size attribute using multiway split.
f) Which attribute is better, Gender, Car Type, or Shirt Size?
g) Explain why Customer ID should not be used as the attribute test condition
even though it has the lowest Gini.
Problem 3 (24 points). Consider the training examples shown in the following
table for a binary classification problem.
a) What is the entropy of this collection of training examples with respect to
the positive class?
b) What are the information gains of 1 and 2 relative to these training
examples?
c) For 3, which is a continuous attribute, compute the information gain for
every possible split.
d) What is the best split (among 1, 2, and 3) according to the information
gain?
e) What is the best split (between 1 and 2) according to the classification
error rate?
f) What is the best split ( 1 and 2) according to the Gini index?


0 comments