– A description of the dataset can be found here: https://www.kaggle.com/snapcrack/all-the-news/version/4#_=_ (Links to an external site.)
– It’s a dataset of news articles. It includes the text and the website of origin for each article.
– Design a classifier that can accurately predict the website of origin for an article, given its text.
– evaluated based on the achieved accuracy, and the number and type of mechanisms (algorithms, parameter tuning, feature engineering) that you implemented.
(I am using Spyder for Python).
Data!
One the website, near the top of the page to the right is a button that says Download (253MB) next to the blue button “New Kernel”. The file ( article 1; article 2; article 3) will be in there with excel worksheets.


0 comments