The first two questions use the Deposits Excel file.
you will
need to import it into RapidMiner. It consists of all individual
deposits made at a regional bank in a single day. There are 3510
deposits, and four attributes (columns) in the dataset: the deposit
amount; whether the customer was depositing cash, checks, or both; the
branch number; and whether the transaction was handled by an ATM or a
teller.
As the analyst working on the dataset, you have determined
that Branch # is irrelevant. You have also noticed that there are
several “-1” values for the Amount ($) variable, which indicate an error
in processing the deposit. You plan to focus primarily on cash
deposits.
- Build a process in RapidMiner that does the following:
-Selects the Amount ($), Type, and Method attributes (but not Branch #)
-Removes all rows from the data set with Amount ($) = -1
-Keeps only rows with Type = “Cash”
Show a screenshot of the Process panel. (You do not need to include the Parameters panel.)
- Run
your process from the previous question. Show a screenshot of the
Statistics output in the Results view, with Amount ($) expanded (that
is, with the histogram and deviation visible for the Amount ($)
attribute).
The next three questions use the
“Labor-Negotiations” dataset that comes with RapidMiner. It is located
in the Repository panel, in Samples -> data.
- Build a process that uses the Select Attributes and Filter Examples operators to obtain a dataset that includes only the duration, wage-inc-1st, and working-hours attributes, and only includes examples where the value for working-hours is at least 36, and the value for duration
is not missing. Show a screenshot of this smaller dataset in the
Results view. Your screenshot does not need to show all of the rows in
the Results view, but must include at least the first 10. - Of the workers in this smaller data set, what is the mean of wage-inc-1st?
- Use
the Correlation Matrix operator to create a correlation matrix of this
smaller data set, and show a screenshot of the matrix. Of the three
attributes, are there any pairs that appear to be correlated? If so,
which one(s)?


0 comments