- 1.Discuss the advantages and disadvantages of using sampling to reduce the number of data objects that need to be displayed. Would simple random sampling (without replacement) be a good approach to sampling? Why or why not?
- 2. How might you address the problem that a histogram depends on the number and location of the bins? Show that the entropy of a node never increases after splitting it into smaller successor nodes.
- 3. Compute a two-level decision tree using the greedy approach described in this chapter. Use the classification error rate as the criterion for splitting. What is the overall error rate of the induced tree?
- 4. Consider the one-dimensional data set shown
below:
|
X |
.5 |
3.0 |
4.5 |
4.6 |
4.9 |
5.2 |
5.3 |
5.5 |
7.0 |
9.5 |
|
Y |
– |
– |
+ |
+ |
+ |
– |
– |
+ |
– |
– |
- Classify the data point x = 5.0 according to its 1-, 3-, 5-, and 9-nearest neighbors (using majority vote).
- Repeat the previous analysis using the distance-weighted voting approach.


0 comments