Discovery and Learning with Big Data


Discovery and Learning with Big Data

Allyn Moeller, MBA

Midterm Project

Overview

The project covers all the topics that have been discussed until the end of Module 4 of the course. The materials in any format that have been posted for the class activities should be considered and used for the project. Additionally, the student can use any other source of information that he/she can gather.

IMPORTANT NOTES:

The student can use any source of information that he/she considers the best fit for his/her work on the project.
The sources can be from class lectures, assignments, etc., or from any other sources
Images can include the screenshots that the student has taken while working on the class assignments.
Screenshots without details of explaining what they are and what they are for are considered incomplete.
All students are free to discuss with their classmates while working on the midterm project.
The midterm project is an individual assignment. All the submitted documents for the midterm project are the work done only by the student.
All the datasets used in the lecture and the assignments are posted on Canvas

PART I: Big Data, Artificial Intelligence, and Machine Learning (100 Points)

SUBMISSION REQUIREMENTS:

Research and discuss (Min: 2 pages, Max: 3 Pages – including images) the history of artificial intelligence until now, focusing on the recent advancement of the field.
Select three different sectors of the U.S. economy, do research, and discuss (Min: 2 pages, Max: 3 Pages.: including images) the impacts of big data and machine learning on each of them.
Discuss in detail the three major styles of learning in machine learning (Min: 1 paragraph each):
Supervised Learning
Unsupervised Learning
Semi-Supervised Learning
IMPORTANT NOTES:

The student can select any sector in which he/she is interested. For example, he/she can choose high-tech, retail, and transportation, or healthcare, education, and manufacturing, to name a few.
Best submitted as a Word document
PART ll: Machine Learning: Supervised – Linear Regression (60 Points)
TO-DO

Follow the steps discussed in class, train a machine learning model using the linear regression algorithm on the full dataset (all columns) housing_boston.csv with Python library Scikit-Learn.

This machine learning project includes the following steps:

Load the data
Preprocess the dataset
Perform the exploratory data analysis (EDA) on the dataset
Separate the dataset into the input and output NumPy arrays
Split the input/output arrays into the training/testing datasets
Build and train the model
Calculate the R2 value
Predict the “Median value of owner-occupied homes in 1000 dollars”
It is assumed that two new suburbs/towns/developments have been established in the Boston area. The agency has collected the housing data of these two new suburbs/towns/developments.
Make up two housing records to be used as predictors (all the variables except MEDV)
Use these two new records as the new data, feed them into the model to predict the median value of owner-occupied homes in 1000’s of dollars
Evaluate the model using the 10-fold cross-validation
IMPORTANT NOTES

For Exploratory Data Analysis (EDA), univariate data visualization, each chart of each applicable variable must be displayed in its own plot.
Run the code of each step to show the results
For Step 8 (Prediction): for each predictor, the student should clearly present the value of each predictor.
If uncertain what values to use for the prediction, one approach is to use the mean from summary statistics
Best submitted as a Markdown document exported as PDF
Creating Markdown Documents in Jupyter (Links to an external site.)
PART III: Machine Learning: Supervised – Logistic Regression (60 Points)
TO-DO

Follow the steps discussed in the videos, train a machine learning model using the logistic regression algorithm on the dataset pima_diabetes.csv with Python library Scikit-Learn.

This machine learning project includes the following steps:

Load the data
Preprocess the dataset
Perform the exploratory data analysis (EDA) on the dataset
Separate the dataset into the input and output NumPy arrays
Split the input/output arrays into the training/testing datasets
Build and train the model
Score the accuracy of the model
Predict the outcome (having diabetes or not) of two new records:
It is assumed that new data has been collected from two persons whose information has not yet been included in the existing
Make up two new records consisting of the predictors (all the variables except “class”) to represent the data of these two new persons, using the existing records of the dataset as samples.
Use these two records as the new data, feed them into the model to predict the outcome, i.e., having diabetes or not
Evaluate the model using the 10-fold cross-validation
IMPORTANT NOTES

For Exploratory Data Analysis (EDA), univariate data visualization, each chart of each applicable variable must be displayed in its own plot.
Run the code of each step to show the results
For Step 8 (Prediction): for each predictor, the student should clearly present the value of each predictor.
If uncertain what values to use for the prediction, one approach is to use the mean from summary statistics
Best submitted as a Markdown document exported as PDF
Creating Markdown Documents in Jupyter (Links to an external site.)
PART IV: ML: Supervised: Regression: Decision Tree Regression (60 Points)
TO-DO

Follow the steps discussed in the videos, train a machine learning model using the decision tree regression algorithm on the full dataset housing_boston.csv with Python library Scikit-Learn.

This machine learning project includes the following steps:

Load the data
Preprocess the dataset
Perform the exploratory data analysis (EDA) on the dataset
Separate the dataset into the input and output NumPy arrays
Split the input/output arrays into the training/testing datasets
Build and train the model
Calculate the R2 value
Predict the “Median value of owner-occupied homes in 1000 dollars”
It is assumed that two new suburbs/towns have been established in the Boston area. The agency has collected the housing data of these two new suburbs/towns.
Make up two housing records consisting of the predictors (all the variables except MEDV) to represent the housing data of these new towns, using the existing records of the dataset as a reference
Use these two records as the new data, feed them into the model to predict the median value of owner-occupied homes in 1000’s of dollars
Evaluate the model using the 10-fold cross-validation
IMPORTANT NOTES

For Exploratory Data Analysis (EDA), univariate data visualization, each chart of each applicable variable must be displayed in its own plot.
Run the code of each step to show the results
For Step 8 (Prediction): for each predictor, the student should clearly present the value of each predictor.
If uncertain what values to use for the prediction, one approach is to use the mean from summary statistics
Best submitted as a Markdown document exported as PDF
Creating Markdown Documents in Jupyter (Links to an external site.)
PART V: ML: Supervised: Classification: K-Nearest Neighbors (60 Points)
TO-DO

Follow the steps discussed in the videos, train a machine learning model using the (KNN) prediction algorithm on the dataset pima_diabetes.csv with Python library Scikit-Learn.

This machine learning project includes the following steps:

Load the data
Preprocess the dataset
Perform the exploratory data analysis (EDA) on the dataset
Separate the dataset into the input and output NumPy arrays
Split the input/output arrays into the training/testing datasets
Build and train the model
Score the accuracy of the model
Predict the outcome (having diabetes or not) of two new records:
It is assumed that new data has been collected from two persons whose information has not yet been included in the existing
Make up two new records consisting of the predictors (all the variables except “class”) to represent the data of these two new persons, using the existing records of the dataset as samples.
Use these two records as the new data, feed them into the model to predict the outcome, e., having diabetes or not.
Evaluate the model using the 10-fold cross-validation technique.
IMPORTANT NOTES

For Exploratory Data Analysis (EDA), univariate data visualization, each chart of each applicable variable must be displayed in its own plot.
Run the code of each step to show the results
For Step 8 (Prediction): for each predictor, the student should clearly present the value of each predictor.
If uncertain what values to use for the prediction, one approach is to use the mean from summary statistics
Best submitted as a Markdown document exported as PDF
Creating Markdown Documents in Jupyter (Links to an external site.)
PART VI: Evaluate and Compare Machine Learning Models (60 Points)
Regression Models: Linear Regression vs. Decision Tree (CART) Regression

TO-DO

Based on the results obtained in Step 7 of the above exercises (Calculate R2 value), make observations and compare the values. (1-2 paragraphs)
Based on the results obtained in Step 8 (Prediction), make observations and compare the results. (1-2 paragraphs)
Based on the results obtained in Step 9 (Evaluate the model using 10-fold cross-validation), make observations and compare the results. (1-2 paragraphs)
IMPORTANT NOTES

Use these values to evaluate the quality of the models. If the results are not the same, what is the difference?
Make a conclusion, if possible, on which model has higher quality in predicting outcomes and should be selected as the model to make predictions on the new data
Classification Models: Logistic Regression vs. K-Nearest Neighbors
TO-DO

Based on the results obtained in Step 7 (Score the accuracy level of the model), make observations and compare the values.
Based on the results obtained in Step 8 (Prediction), make observations and compare the results.
Based on the results obtained in Step 9 (Evaluate the model using IO-fold cross-validation), make observations and compare the results.
IMPORTANT NOTES
Use these values to evaluate the quality of the models. If the results are not the same, what is the difference?
Make a conclusion, if possible, on which model has higher quality in predicting outcomes and should be selected as the model to make predictions on the new data
Best submitted as a Word document
Grading Criteria
The mid-term project is graded based on the following grade components:

  1. mid-term project report:

25%

  1. mid-term project Python exercises:

75%

Mid-term Project Report (400 Points)
PART I: Big Data, Artificial Intelligence, and Machine Learning (100 Points)
PART Il: ML: Supervised Regression – Linear Regression (60 Points)
PART III: ML: Supervised Classification – Logistic Regression (60 Points)
PART IV: ML: Supervised Regression – Decision Tree (CART) (60 Points)
PART V: ML: Supervised Regression – K-Nearest Neighbors (60 Points)
PART VI: Evaluate and Compare Machine Learning Models (60 Points)

HOW to Submit
Project, Reports and All Related Documents
The student is required to submit the final project report and all related documents to Canvas

Research Paper 101
Calculate your paper price
Pages (550 words)
Approximate price: -

Reasons to trust Research Paper 101

On Time Delivery

We pride ourselves in meeting the deadlines of our customers. We take your order, assign a writer but allow some more time for ourselves to edit the paper before delivering to you. You are guaranteed a flawless paper on a timely manner...

24x7 Customer Live Support

Our team at Research Paper 101 is committed to handling your paper according to the specfications and are available 24*7 for communication. Whenever you need a quick help, you can talk to our writers via the system messaging or contact support via live chat and we will deliver your message instantly.

Experienced Subject Experts

Online Experts from Research Paper 101 are qualified both academically and in their experiences. Many are Masters and Phd holders and therefore, are qualified to handle complex assignments that require critical thinking and analyses...

Customer Satisfaction

We offer dissertation papers as per students’ wishes. We also help craft out the best topics and design concept papers. By ordering with us, you are guaranteed of defending and making through those hard faced professors in the defense panel!

100% Plagiarism Free

We at Research Paper 101 take plagiarism as a serious offence. From the start, we train our writers to write all their papers from scratch. We also check if the papers have been cited appropriately. Our website also has a tool designed to check for plagiarism that has been made erroniusly. In essense, the paper you get will be 100% legit...

Affordable Prices

We understand that being a student is very challenging, some students balance between work and studies in order to survive. We therefore offer pocket friendly rates that are very competitive in the market.

Try it now!

Calculate the price of your order

Total price:
$0.00

How it works?

Follow these simple steps to get your paper done

Place your order

Fill in the order form and provide all details of your assignment.

Proceed with the payment

Choose the payment system that suits you most.

Receive the final file

Once your paper is ready, we will email it to you.

Our Services

No need to work on your paper at night. Sleep tight, we will cover your back. We offer all kinds of writing services.

Essays

Essay Writing Service

No matter what kind of academic paper you need and how urgent you need it, you are welcome to choose your academic level and the type of your paper at an affordable price. We take care of all your paper needs and give a 24/7 customer care support system.

Admissions

Admission Essays & Business Writing Help

An admission essay is an essay or other written statement by a candidate, often a potential student enrolling in a college, university, or graduate school. You can be rest assurred that through our service we will write the best admission essay for you.

Reviews

Editing Support

Our academic writers and editors make the necessary changes to your paper so that it is polished. We also format your document by correctly quoting the sources and creating reference lists in the formats APA, Harvard, MLA, Chicago / Turabian.

Reviews

Revision Support

If you think your paper could be improved, you can request a review. In this case, your paper will be checked by the writer or assigned to an editor. You can use this option as many times as you see fit. This is free because we want you to be completely satisfied with the service offered.

error: