Data Analysis

0 comments

School of Computing

Module Coordinator Other lecturers 
Date Issued 
CodeCOSREP / M26538
TitleApplied Machine Learning and Data Mining

Schedule and Deliverables

ItemValueFormatDeadlineLate deadline ECF deadline
Coursework  60%One report file (.pdf) A single .zip file containing the python source codes (upload it to your github repository)    
     
     
     

Notes and Advice

  • The Extenuating Circumstances procedure is there to support you if you have had any circumstances (problems) that have been serious or significant enough to prevent you from attending, completing or submitting an assessment on time.
  • ASDAC are available to any students who disclose a disability or require additional support for their academic studies with a good set of resources on the ASDAC moodle site
  • The University takes plagiarism seriously. Please ensure you adhere to the plagiarism guidelines. And watch the video on Plagiarism
  • Any material included in your coursework should be fully cited and referenced in APA 7 format  Detailed advice on referencing is available from the library
  • Any material submitted that does not meet format or submission guidelines, or falls outside of the submission deadline could be subject to a cap on your overall result or disqualification entirely.
  • If you need additional assistance, you can ask your personal tutor, student engagement officer   , academic tutor or your lecturers.

First Submission- Supervised Learning

Task I: Classification using Python

Download the following datasets, which reflect different Machine learning and data mining applications.

Medical:

  1. Medical Data https://www.kaggle.com/dansbecker/hospital-readmissions
  2. Heart attack predication https://www.kaggle.com/imnikhilanand/heart-attack-prediction

Finance:

  1. Banking https://www.kaggle.com/janiobachmann/bank-marketing-dataset
  2. Loan prediction https://www.kaggle.com/ninzaami/loan-predication

Earth and Nature:

  1. Mushroom Classification https://www.kaggle.com/uciml/mushroom-classification
  2. Weather https://www.kaggle.com/jsphyg/weather-dataset-rattle-package/download

Retail:

  1. Online shoppers’ intention https://kaggle.com/roshansharma/online-shoppers-intention
  2. Ecommerce datacom/carrie1/ecommerce-data“> https://www.kaggle.com/carrie1/ecommerce-data

In addition, select an application of your choice, search for two different datasets using https://www.kaggle.com/datasets

Task:

You are required to apply the following classification techniques using Python on all the datasets.

  1. Decision tree
  2. K-NN (with K taking the value of 1 up to the number of class labels in the dataset).
  3. Naive Bayes
  4. An algorithm of your choice. 

Once you have applied the algorithms on all the datasets, it is required to accomplish the following tasks:

  • Compare the performance of the applied techniques in terms of accuracy.
  • Analyse the results with regards to the dataset properties.
  • You can use data exploratory techniques (visualisation) to explore the dataset and analyse the results.

Task II: Regression using python

Download the following datasets which reflect different Machine learning and data mining applications.

Social networks:

  1. Facebook metrics https://archive.ics.uci.edu/ml/datasets/Facebook+metrics

Medical:

  1. Fertility: https://archive.ics.uci.edu/ml/datasets/Fertility

In addition, select an application of your choice, search for a dataset using edu/ml/index.php”>https://archive.ics.uci.edu/ml/index.php. 

Task:

You are required to apply the following on all the datasets using python:

  1. Linear Regression.
  2. An algorithm of your choice.

Once you have applied the algorithm on all the datasets, it is required to accomplish the following tasks:

  • Compare the performance of the applied techniques.
  • Analyse the results with regards to the dataset properties.
  • You can use data exploratory techniques (visualisation) to explore the dataset and analyse the results.

Deliverables of the components of the coursework are:

  • A report documenting Task I and Task II in no more than 1500 words excluding figures and tables. Your report must cover the following areas:
    • A short summary of the machine learning applications and datasets you used and a justification of the chosen datasets.
    • A detailed analysis of your results when comparing the different regression techniques.

The submission is online through Moodle (the submission details will be available on Moodle).

Please ensure that your coursework is anonymous. Your NAME must not appear anywhere on the coursework or the cover sheet. Please use your ID only

This component of your coursework contributes 60% of the total mark. The marking criteria [in 100% breakup of marks] for this component are as follows:

20%     Justification of choice and the datasets used

20%     Appropriate use of tables and figures when reporting the results

30%     Analysis of the results of the experiments you have conducted

10%     Conclusion with recommendations on how to match a dataset to a technique

10%     Organisation, language style and clarity

10%     Python Code

About the Author

Follow me


{"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}