The objective of this Portfolio Project is mining data from a data warehouse, which contains data from the Northwind database that was constructed during your installation of PostgreSQL.
Below are the summarized tasks for this Project.
***This requires Pentaho ETL Data Integration, PostgreSQL, and SAS University (all free downloads).***
Data Warehouse:
- Create a data warehouse database (MUST USE DATA FROM NORTHWIND DATABASE: https://github.com/pthom/northwind_psql/blob/maste…), including the fact and dimension tables (star schema).
- Create the schema for each table.
- Populate the tables using either ETL (Pentaho) or SQL (PostgreSQL).
Preprocessing for SAS:
- Extract data from the data warehouse, creating a file for input into SAS. The format of the file is your choice. Ensure SAS University Edition accepts your selected format.
Statistical Analysis Using SAS:
- Import data created in the preprocessing step.
- Conduct statistical analysis using the appropriate statistics from each category:
- Summary statistics
- Classification
- Clustering
- Association
- Prepare an analysis report.
Your analysis report must include:
- An analysis of each variable in the data set
- An analysis to determine which variables could serve as appropriate classifier variables
- An analysis to determine if any variables are candidates for clustering
- An analysis to determine if any variables have associations
- Any tables, histograms, or scatterplot graphs necessary to support your analyses
- A recommendation as to the suitability of this data set for meeting your organization’s business goal
Your project must meet the following requirements:
- Be 6-8 pages in length, not including the cover and references pages.
- Follow the APA guidelines. Your paper should include an introduction, a body with at least four fully developed paragraphs, and a conclusion.
- Must have a minimum of 2 sources


0 comments