Project Description
For this project you will use Lending Club loan data. I have already cleaned up the data set for you. The ultimate goal of the project is to identify whether a given customer will default on his loan or not. You need to run several machine learning algorithms to perform this task. Your main challenge is able to do the prediction with multiple models using SPSS modeler and compare their performance. You are also welcomed to pick your own datasets but please come to me for approval before you start on your project. Extra points will be given for choosing your own problem.
Some facts about the data set.
The data consist of 140 features of almost 40000 different individual loan record from lending club database.2. The target variable is the loan status. ‘Charged off’ denote default and ‘Fully paid’ denotes not default.3. Although I already cleaned up the data, there are some features (variable) in the data that are either to messy to work with or probably not required for building your model. So, use your judgment before assigning these features as your input into the model.
This what i need to submit
1. A report (word/PDF file) that is 5 pages (double space, including tables and figures) long. The report should include:
-An Introduction: problem description and definition-
-Data description
-Method
-Result
– Discussion
Tip: Run multiple models to find the one with the best performance. Notice that it is a classification problem (supervised learning), so make sure using the right models.2. Go back and adjust the selection of input variables. Select or Unselect some variables and see does this gives you a better performance.3. You should try at least two to three classification techniques. And report the best performing one.4. The performance evaluation and comparison should be discussed in full detail. You need to include the predictor importance result from the rule induction model and discuss it. Also, in the result part of your project, highlight the best accuracy you get and corresponding model settings from which you achieved that accuracy.


0 comments