• Home
  • Blog
  • COM 1018 JNTU Data Mining WEKA Toolkit Data Analysis & Diverse Sources Report

COM 1018 JNTU Data Mining WEKA Toolkit Data Analysis & Diverse Sources Report

0 comments

Course :: Data Mining

A single PDF document containing your report, to a maximum 10 pages.

Please follow the attachement DATAset zip file

Assignment Brief:

A dataset of text is provided in the assignment area on Canvas. Analyse this data using the WEKA toolkit and tools introduced within this module, comparing two different forms of preprocessing: For example, you may investigate the impact of using stemming, the effect of reducing the number of features, the impact of term frequency over a simple word count, etc.

Complete the following tasks:

1. Describe which question you will be investigating (e.g. “is stemming beneficial to improving performance?”, “is the reduction of features beneficial to improving performance?”, etc.) and why you think your choice is an interesting question to investigate.

2. Convert the text dataset into TWO different databases in ARFF format, based on your chosen question. Explain the conversion techniques and parameters that you have used, along with any other pre-processing you wish to do. (Do not include a screen shot of the attributes in WEKA –you need to describe them.)

3. For each database, produce a table and a graph of classification performance against training set size for the following three classifiers: decision-tree (J48), Naïve Bayes, Support Vector Machine. For the Support-Vector Machine you must determine the kernel,and its parameters.

4. Write a conclusion. You should at least compare the performance of the different learning algorithms on your databases, and answer the question you posed in part (1).Remember to explain the steps you have taken to complete each task in your report.

Screenshots are typically not required, and should be used sparingly if at all.

About the Author

Follow me


{"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}