Naive Bayes Classifier

A simpler Bayesian Classifier which assumes independent attributes / classes which is why we can write $P (c ∣ e_{1}, \dots, e_{n}) = α ⟨ e_{1}, \dots, e_{n} ⟩ \cdot P (c) \cdot \prod_{i} P (e_{i} ∣ c)$ Assuming boolean variables (which we can always achieve via Binning and One-Hot-Encoding), we have the following parameters in the CPTs $θ = P (c = T), θ_{i 1} = P (X_{i} = T ∣ C = T), and θ_{i 2} = P (X_{i} = T ∣ C = F)$

We can then do a prediction with $P (C ∣ x_{1}, \dots, x_{n}) = α \cdot P (C) \cdot \prod_{i} P (x_{i} ∣ C)$ The cool thing is, that there are only $2 n + 1$ parameters and there is no search required. Naive Bayes models perform great, even when the data is noisy.

Calculation of Probabilities

If attribute is categorical: $P (x_{k} ∣ C_{i})$ is the number of tuples in class $C_{i}$ having value $x_{k}$ divided by the number of tuples of the class $C_{i}$ .

If attribute is continous: Usually computed based on Normal Distribution with $P (x_{k} ∣ C_{i}) = G (x_{k}, μ_{C_{i}}, σ_{C_{i}})$

Pros

easy to implement

but still good results

Cons

assumption leads to loss of Accuracy

as there will always be some kind of dependency amongst variables

→ Bayesian Belief Network

Implementation

Use this function to test whether datapoint is more likely for class 1 vs. class 0 by estimating loc and scale from the data for the respective classes.

scipy.stats.norm.pdf(value, loc, scale)

Marcs Notes

Explorer

Naive Bayes Classifier

Naive Bayes Classifier

Calculation of Probabilities

Implementation

Graphansicht

Inhaltsverzeichnis

Backlinks