Home
Blog
Data Science , R programming wordcloud

Data Science , R programming wordcloud

Daniel Kevins

0 comments

I a new to coding and I didnt realize how much time it would take to over come erros. I need some clarification on the work I have already started and I want to compare it to your code so I can move on to the next subject. If you do a good job I will always pick you for the rest of the year. I am kinda in a rush because my assignment is already 1 and a half past due. I have included some code below that my instructor provided but I dont know if it similar to the assignment csv file I have uploaded in a zip file. If you have more questions please let me know.

IChapter 14: Word Perfect (page 174 – 186 of “An Introduction to Data Science” by Jeffrey Saltz and Jeffrey

Stanton# ———– Chapter 14: Chapter 14: Word Perfect ———–
library(XML)library(tm)
#read the speech – the actual file location will need to be updatedsbaFile <- “/Users/jsaltz/Google Drive/Courses/IST 687/2U/Week 8 – Text Mining/data/sba-speech.txt”sbaFile <-read.csv(“sample.csv”, stringsAsFactors = F )head(sbaFile)
sbaFile <-sbaFile$texthead(sbaFile)#use scan#sba <- scan(sbaFile, character(0),sep = “n”)#sba <- scan(sbaFile, character(0))#head(sba, 10)
#use readLines# sba <- readLines(sbaFile)# head(sba, 3)
#Use a web file: Note the web location for the speechsbaLocation <- URLencode(“http://www.historyplace.com/speeches/anthony.htm”)
# Read and parse HTML filedoc.html = htmlTreeParse(sbaLocation, useInternal = TRUE)
# Extract all the paragraphs (HTML tag is p, starting at# the root of the document). Unlist flattens the list to# create a character vector.sba = unlist(xpathApply(doc.html, ‘//p’, xmlValue))head(sba, 3)
words.vec <- VectorSource(sba)words.corpus <- Corpus(words.vec)words.corpuswords.corpus <- tm_map(words.corpus, content_transformer(tolower))words.corpus <- tm_map(words.corpus, removePunctuation)words.corpus <- tm_map(words.corpus, removeNumbers)words.corpus <- tm_map(words.corpus, removeWords, stopwords(“english”))tdm <- TermDocumentMatrix(words.corpus)tdm
m <- as.matrix(tdm)wordCounts <- rowSums(m)wordCounts <- sort(wordCounts, decreasing=TRUE)head(wordCounts)
library(wordcloud)cloudFrame <- data.frame(word = names(wordCounts), freq=wordCounts)
wordcloud(cloudFrame$word, cloudFrame$freq)
wordcloud(names(wordCounts), wordCounts, min.freq=2, max.words=50, rot.per=0.35, colors=brewer.pal(8, “Dark2”))

About the Author

Follow me