Please use the following DATASET to help analyze in R Studio
Question 1: Explore and prepare data by using the str() function. Display the probability of the attributes (‘benign’ and ‘malignant’) of the variable named “diagnosis” that we plan to use for prediction and normailize the entire dataset.
Question 2: Create datasets for training and testing the model, and develop the model using the knn classifier algorithm. Evaluate the model with different k, and propose the best value of k.
-
- Split the dataset into training and testing. The proportions of training and testing dataset will be 70%:30%.
- Develop the model using the knn classifier algorithm.
- Evaluate the model’s performance for different K, and suggest the best model.
Question 3: Build a logistic regression model with the same dataset above and the data partition to develop the best possible diagnostic machine learning algorithm to assist the medical team in determining whether the tumor is malignant or not. In this analysis, you need to standardize the data, and develop a logistic regression model. Provide a detailed explanation of the output and the comparison with the outcomes of knn in Question 2.


0 comments