• Home
  • Blog
  • MIS 655 GCU Pros and Cons of Naïve Bayes Classifier Discussion Responses

MIS 655 GCU Pros and Cons of Naïve Bayes Classifier Discussion Responses

0 comments

Discussion 1:Meredith

Pro

  1. Prediction of class in the test data set is fast and easy
  2. Naïve Bayes performs well with categorical variables as well as numerical.
  3. Only a small amount of training data is required in order to run the algorithm

Con

  1. If a category exists in the test data that was not in the training data, then it will be assigned a probability of 0.
  2. Naïve Bayes assumes that predictors are independent, but in actuality it is rare for all predictors to be completely independent of one another.
  3. Known to be a poor estimator.

The first Pro is regarding speed and ease of use of the algorithm. There are many algorithms that are slow (neural networks) and/or complex (random forest). Having an algorithm that can be run quickly saves time and sanity. Next, the Naïve Bayes can handle numerical variables, but has to create a normal distribution for them. It can handle categories best, and when classifying data, the information is typically in categorical form. Lastly, by being able to limit the amount of training data needed, you can potentially avoid overfitting (Ray, 2017).

For the Cons, forcing a probability of 0 to any data is not beneficial to the overall outcome. It’s much better to have a probability assigned to each possible category. The algorithm is much faster than others, however that is due in part to the fact that it assumes predictors are independent. This makes sense because the model is based on Bayes’ Theorem for conditional probability. It’s very rare in life for predictors to be independent of one another. Because the model assumes predictors are independent, it is known to be inaccurate (Yildirim, 2020).

Ray, S. (September 11, 2017). 6 easy steps to learn naïve bayes algorithm with codes in Python and R. Analytics Vidhya. Retrieved on November 22, 2021 from https://www.analyticsvidhya.com/blog/2017/09/naive-bayes-explained/

Yildirim, S. (February 14, 2020). Naïve Bayes classifier – explained. Towards Data Science. Retrieved on November 22, 2021 from https://towardsdatascience.com/naive-bayes-classifier-explained-50f9723571ed

Discussion 2: Tyler

Pros of the Naïve Bayes classifier:

  • It is a very fast algorithm that can “easily predict the class of a test dataset” (Vadapalli, 2020).
  • It can be used to solve multi-class prediction problems.
  • It performs better than many other models in cases where less data is available if the assumption of independence of features holds (Vadapalli, 2020).
  • Performs exceptionally well with categorical input variables.

Cons of the Naïve Bayes classifier

  • If you add a new category variable after training the model it will be assign to zero probability and will not make proper predictions. This is often referred to as ‘Zero Frequency’ (Vadapalli, 2020).
  • This model is not a great estimator.
  • The model runs off of the assumption that all features are independent of one another which is rarely realistic but is situationally acceptable.

Regarding the pros of the Naïve Bayes classifier model, it is clear how many of these make this model a great choice in many circumstances. The speed of a model means that less resources are necessary to run the model efficiently. Its strength as a predictor stands out especially over models when less data is available which may be the case for a lot of relatively new project. It is a big deal for companies to experience these benefits with a classification model.

There are always circumstantial downsides to every machine learning model and while this model experiences several that seem like big deals, there are still cases where these downsides can be overcome. For example, when the zero frequency problem occurs within your dataset, this problem can be overcome through data smoothing (Vadapalli, 2020). Knowing that this model is not a great estimator is also important so that you don’t rely on it for that purpose. Regarding the final con, it can seem like a very large downside to be required to assume that all of your features are independent of one another but in many circumstances this may not be a deal breaker. Especially in the case that you’re working on a classification problem with a relatively small dataset. In a case like this, the pros may likely outweigh the cons.

References

Vadapalli, P. (2020, December 7). Naive Bayes classifier: Pros & Cons, applications & types explained. upGrad blog. Retrieved November 26, 2021, from https://www.upgrad.com/blog/naive-bayes-classifier/.

Discussion 3: Cornilus

A Naive Bayes classifier is a probabilistic machine learning model that’s used for classification task. The crux of the classifier is based on the Bayes theorem. Using Bayes theorem, we can find the probability of A happening, given that B has occurred. Here, B is the evidence and A is the hypothesis. The assumption made here is that the predictors/features are independent. That is presence of one particular feature does not affect the other. Hence it is called naive.

Advantages of Naïve Bayes Classifier

1. When assumption of independent predictors holds true, a Naive Bayes classifier performs better as compared to other models.

2. Naive Bayes requires a small amount of training data to estimate the test data. So, the training period is less.

3. Naive Bayes is also easy to implement.

Disadvantages of Naïve Bayes Classifier

1. Main imitation of Naive Bayes is the assumption of independent predictors. Naive Bayes implicitly assumes that all the attributes are mutually independent. In real life, it is almost impossible that we get a set of predictors which are completely independent.

2. If categorical variable has a category in test data set, which was not observed in training data set, then model will assign a 0 (zero) probability and will be unable to make a prediction.

Naive Bayes algorithms are mostly used in sentiment analysis, spam filtering, and recommendation systems etc. They are fast and easy to implement but their biggest disadvantage is that the requirement of predictors to be independent. In most of the real life cases, the predictors are dependent, this hinders the performance of the classifier.

The Naive Bayes model is called “Naive” precisely because it assumes features are conditionally independent from one another despite the fact that this assumption very rarely holds. However, the Naive Bayes model can still achieve strong classification accuracy even in cases where its conditional independence assumption is significantly inaccurate.

References

The Professionals Point: Advantages and Disadvantages of Naive Bayes in Machine Learning

Naive Bayes Classifier. What is a classifier? | by Rohith Gandhi | Towards Data Science

About the Author

Follow me


{"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}