• Home
  • Blog
  • Understanding Income Inequality of Americans K Nearest Neighbor Classifier Project

Understanding Income Inequality of Americans K Nearest Neighbor Classifier Project

0 comments

Theme: Understanding income inequality of Americans.

Problem:

We have been given census data about socioeconomic and demographic attributes of US citizens (e.g., occupation, education, gender, race, marital status, capital gains, capital loss) and their annual income category (less than or greater than $50,000). Using approoriate sampling strategy bulid KNN classifier for the given data to identify high income and low income individuals.

  • Part 1: Use proper data cleansing techniques to ensure that you have the highest quality data to model this problem. Detail your process and discuss the decisions you made to clean the data.
  • Part 2: Build K-Nearest Neighbors models (use at least three different training/test sample size ratios) for the given data, interpret the results, and convey those results to stakeholders. Highlight key learning points such as impact of training sample size, determination of optimal K, confusion matrix, and accuracy of model.

The R code should be written using R markdown file, and report should be generated as Word file using knitR package. Ensure that the analytics output are properly formatted and the report has professional look & feel.

About the Author

Follow me


{"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}