Simple questions

0 comments

The first two questions use the Deposits Excel file.

you will
need to import it into RapidMiner. It consists of all individual
deposits made at a regional bank in a single day. There are 3510
deposits, and four attributes (columns) in the dataset: the deposit
amount; whether the customer was depositing cash, checks, or both; the
branch number; and whether the transaction was handled by an ATM or a
teller.

As the analyst working on the dataset, you have determined
that Branch # is irrelevant. You have also noticed that there are
several “-1” values for the Amount ($) variable, which indicate an error
in processing the deposit. You plan to focus primarily on cash
deposits.

  1. Build a process in RapidMiner that does the following:

-Selects the Amount ($), Type, and Method attributes (but not Branch #)
-Removes all rows from the data set with Amount ($) = -1
-Keeps only rows with Type = “Cash”

Show a screenshot of the Process panel. (You do not need to include the Parameters panel.)

  1. Run
    your process from the previous question. Show a screenshot of the
    Statistics output in the Results view, with Amount ($) expanded (that
    is, with the histogram and deviation visible for the Amount ($)
    attribute).

The next three questions use the
“Labor-Negotiations” dataset that comes with RapidMiner. It is located
in the Repository panel, in Samples -> data.

  1. Build a process that uses the Select Attributes and Filter Examples operators to obtain a dataset that includes only the duration, wage-inc-1st, and working-hours attributes, and only includes examples where the value for working-hours is at least 36, and the value for duration
    is not missing. Show a screenshot of this smaller dataset in the
    Results view. Your screenshot does not need to show all of the rows in
    the Results view, but must include at least the first 10.
  2. Of the workers in this smaller data set, what is the mean of wage-inc-1st?
  3. Use
    the Correlation Matrix operator to create a correlation matrix of this
    smaller data set, and show a screenshot of the matrix. Of the three
    attributes, are there any pairs that appear to be correlated? If so,
    which one(s)?

About the Author

Follow me


{"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}