Assignment #3
Step 1: First download “Associations.xls” files found on Canvas and import the file into Rapidminer like assignment 2(click File tab to choose “add data”).
Step 2: Drag and drop the file to the Process view.
Step 3: In the Operators view, type “select attributes” to search for “select attributes” operator. Drag and drop the operator to the Process view and connect the two operators as follows. In the Parameters view, set attribute filter type as “single” and select attribute “Seq#”. The last step is to check “invert selection” to select all attributes except Seq#.
Step 4a: In the Operators view, type “Numerical to Binomial” to search for “Numerical to Binomial” operator. Drag and drop the operator to the Process view and connect the operators as follows. In the Parameters view, set attribute filter type as “all” to convert all attributes from Numerical to Binomial. Then click the blue run button to run the operator.
Step 4b: Click the “Results” tab and check “ExampleSet (Numerical to Binomial)” tab to make sure all attributes are set.
Step 5: Switch back to Design view. In the Operators view, type “FP-Growth” to search for “FP-Growth” operator. Drag and drop the operator to the Process view and connect the operators as follows. In the Parameters view, uncheck “find min number of itemsets” and set min support as 0.1 and max items as 3.
Step 6a: In the Operators view, type “create association” to search for “Create Association Rules” operator. Drag and drop the operator to the Process view and connect the operators as follows. In the Parameters view, select “confidence” for criterion and set min confidence at 0.4. Click the blue run button.
Step 6b: Click “Results” tab and then “AssociationRules (Create Association Rules)” tab to check out the values of confidence and lift.
Step 7: Save the Rapidminer file. Click “File” button to select “Export Process” to save the file in the local path.
Task 2-Perform a cluster analysis
Step 1: First download “BathSoap.xls” files found on Canvas and import the file into Rapidminer like assignment 1 and 2.Make sure to select “DM_Sheet” when importing the file.
Step 2: Drag and drop the file to the Process view.
Step 3:To run the first model, in the Operators view, type “select attributes” to search for “select attributes” operator. Drag and drop the operator to the Process view and connect the two operators as follows. In the Parameters view, set attribute filter type as “subset” and click “select attribute” button. Move those attributes related to purchase behaviors (No. of Brands, Brand Runs, Total Volume, No. of Trans, Value, Trans / Brand Runs, Vol/Tran, Avg. Price,PurVol No Promo – %, Pur Vol Promo 6 %, Pur Vol Other Promo %) from “Attributes” column to “Selected Attributes” column. And then click “Apply” button.
Step 4: In the Operators view, type “Cluster” to search for “k-Means” operator. Drag and drop the operator to the Process view and connect the operators as follows. In the Parameters view, check “add cluster attribute” and set K as 3. Make sure to set measure types as “NumricalMeasures” and numerical measure as “EudideanDistance”.
Step 5a: In the Operators view, type “cluster distance” to search for “Cluster Distance Performance” operator. Drag and drop the operator to the Process view and connect the operators as follows. In the Parameters view, set main criterion as “Avg. within centroid distance” and click the blue run button.
Step 5b: Click “Results” tab to check the results of cluster model (clustering) and performance of the cluster model. The value of Avg. within centroid distance is to measure the performance of the model.
Step 6: For the second model, following step 3, select basis of purchase attributes (Pr Cat 1, Pr Cat 2, Pr Cat 3, Pr Cat 4) instead of purchase behavior attributes (No. of Brands, Brand Runs, Total Volume, No. of Trans, Value, Trans / Brand Runs, Vol/Tran, Avg. Price,PurVol No Promo – %, Pur Vol Promo 6 %, Pur Vol Other Promo %).
Step 7: Repeat step 4 and step 5 to do a cluster analysis.
Step 8: Evaluate the performance of two models by checking the average within centroid distance. The smaller average distance means the model is better.
Step 9: Save the Rapidminer file as Task 1 does (step 7 in Task 1).
Step 10: To examine the characteristics of the clusters, first drag the file to the Process view.
Step 11: Select the attributes that you decide to choose like step 3. Make sure the attribute “Member id” is selected in this model.
Step 12: In the Operators view, type “set role” to search for “set role” operator. Drag and drop the operator to the Process view and connect the operators as follows. In the Parameters view, Select “Member id” for attribute name and “id” for target role.
Step 13: Repeat step 4 to run a cluster model.
Step 14: Drag and drop the BathSoap_2014_1 file to the Process view.
Step 15: Repeat step 12 to set the attribute “Member id” as id.
Step 16: In the Operators view, type “join” to search for “join” operator. Drag and drop the operator to the Process view and connect the operators as follows. In the Parameters view, check “remove double attributes” and “use id attribute as key”, and then select “inner” for join type.
Step 17: In the Operators view, type “set role” to search for “set role” operator. Drag and drop the operator to the Process view and connect the operators as follows again. In the Parameters view, select “cluster” for attribute name and “label” for target role.
Select 18: In the Operators view, type “aggregate” to search for “aggregate” operator. Drag and drop the operator to the Process view and connect the operators as follows. In the Parameters view, click “Edit list” to edit aggregation attributes. Use the drop down button to select “average” for aggregation functions and “age” for aggregation attribute, then click “Add Entry” to add more parameters as follows before clicking “Apply” button. Click “Select attributes” to move “cluster” from attributes column to selected attributes column. Then click “Apply” button again.
Step 19: Click the blue run button and switch to the “Results” tab. Click “Example (Aggregate)” tab to check out the average values of some attributes for each cluster.
Step 20: Save the RapidMiner file and answer the questions.


0 comments