Naive Bayes Classifier
2021-1ํ๊ธฐ, ๋ํ์์ โ๋ฐ์ดํฐ ๋ง์ด๋โ ์์ ์ ๋ฃ๊ณ ๊ณต๋ถํ ๋ฐ๋ฅผ ์ ๋ฆฌํ ๊ธ์ ๋๋ค. ์ง์ ์ ์ธ์ ๋ ํ์์ ๋๋ค :)
Bayesโ Theorem
โOne assumption taken is the strong independence assumptions btw the features.โ
Naive Assumption
โIn a supervised learning situation, Naive Bayes classifiers are trained very efficiently. Naive Bayes classifiers need a small training data to estimate the parameters needed for classification. Naive Bayes classifiers have simple design and implementation, and they can applied to many real life situations.โ
<Naive Bayes Classifier>๋ Supervised Learning Model์ด๋ค. Conditional Probability๋ฅผ ํ์ฉํด ๋ถ๋ฅ๋ฅผ ์งํํ๋ฉฐ Maximum Likelihood๋ฅผ ๋ฌ์ฑํ ์ ์๋ค!
Naive Bayes Classifier
์ฐ๋ฆฌ์ ๋ชฉํ๋ ๋ฐ์ดํฐ์ feature๊ฐ ์ฃผ์ด์ก์ ๋ ๊ฐ label์ ๋ํ ํ๋ฅ ์ธ Posterior Probability $p(c_k \mid \mathbf{x})$๋ฅผ ๊ตฌํ๋ ๊ฒ์ด๋ค.
์ด๋, ๋ฒ ์ด์ฆ ์ ๋ฆฌ๋ฅผ ํตํด Posterior Probability๋ฅผ ์๋์ ๊ฐ์ด Conditional Probability๋ก ๋ถํดํ ์ ์๋ค.
\[p(c_k \mid \mathbf{x}) = \frac{p(c_k) \cdot p(\mathbf{x} \mid c_k)}{p(\mathbf{x})}\]์ด๋, Evidence์ ํด๋นํ๋ $p(\mathbf{x})$๋ output $y$์ ๋ ๋ฆฝ์ด๋ค. ๋ฐ๋ผ์, ์ฐ๋ฆฌ๋ Evidence๋ฅผ ๋ฌด์ํ๊ณ ์๋์ ๊ฐ์ด $y$๋ฅผ ์ถ์ ํ ์ ์๋ค.
\[\begin{aligned} y &= \underset{c_k}{\text{argmax}} \; p(c_k \mid \mathbf{x}) \\ &= \underset{c_k}{\text{argmax}} \; p(c_k) \cdot p(\mathbf{x} \mid c_k) \end{aligned}\]๋ง์ฝ Prior $p(c_k)$์ ํ๋ฅ ์ด ๋ชจ๋ ๊ฐ๋ค๋ฉด, ์ฌ์ ํ๋ฅ ์ ๋ํ ๋ถ๋ถ๋ ์๋ตํ ์ ์๋ค ๐
Data $\mathbf{x}$๊ฐ feature $a_1, a_2, \dots, a_n$์ผ๋ก ๊ตฌ์ฑ๋์ด ์๋ค๊ณ ๊ฐ์ ํ๊ณ , ์์ ์๋ค์ ๋ค์ ์จ๋ณด์.
\[\begin{aligned} p(c_k \mid a_1, \dots, a_n) &= \frac{p(c_k) \cdot p(a_1, \dots, a_n \mid c_k)}{p(a_1, \dots, a_n)} \\ &\propto p(c_k) \cdot p(a_1, \dots, a_n \mid c_k) \end{aligned}\]์ด๋, <NB Classifier>์์๋ ๊ฐ feature๊ฐ ๋ ๋ฆฝ(independent)ํ๋ค๊ณ ๊ฐ์ ํ๋ค. ๋ฐ๋ผ์,
\[\begin{aligned} p(a_1, \dots, a_n) &= p(a_1) \cdots p(a_n) \\ p(a_1, \dots, a_n \mid c_K) &= p(a_1 \mid c_k) \cdots p(a_n \mid c_k) \end{aligned}\]์์ ๋ค์ ์จ๋ณด๋ฉด,
\[\begin{aligned} p(c_k \mid a_1, \dots, a_n) &\propto p(c_k) \cdot p(a_1, \dots, a_n \mid c_k) \\ &= p(c_k) \cdot p(a_1 \mid c_k) \cdots p(a_n \mid c_k) \\ &= p(c_k) \cdot \prod^n_{i=1} p(a_n \mid c_k) \end{aligned}\]๋ง์ฝ ์์ ์์์ $\propto$๋ฅผ ์ ๊ฑฐํ๊ณ ๋ฑ์์ผ๋ก ๋ฐ๊พธ๋ฉด ์๋์ ๊ฐ๋ค.
\[p(c_k \mid a_1, \dots, a_n) = \frac{\displaystyle p(c_k) \cdot \prod^n_{i=1} p(a_n \mid c_k)}{\displaystyle\sum_i p(c_i) p(\mathbf{x} \mid c_i)}\]Gaussian Naive Bayes
<Gaussian NB> ๋ชจ๋ธ์ Likelihood $p(\mathbf{x} \mid c_k)$๊ฐ Gaussian ๋ถํฌ๋ฅผ ๋ฐ๋ฅธ๋ค๊ณ ๊ฐ์ ํ๋ค. ๊ทธ๋ฐ๋ฐ, ์ด๋ $\mathbf{x}$์ ๊ฐ feature๊ฐ ๋ชจ๋ ๋ ๋ฆฝ์ด๋ฏ๋ก ์ฐ๋ฆฌ๋ ๊ฐ feature์ ๋ํ Gaussian Likelihood๋ฅผ ์๋์ ๊ฐ์ด ์ ์ํด ์ฌ์ฉํ๋ค.
\[p(a_i \mid c_k) = \frac{1}{\sqrt{2\pi \sigma_{i, c_k}^2}} \cdot \exp \left( - \frac{(a_i - \mu_{c_k})}{2 \sigma_{i, c_k}^2} \right)\]์์ Likelihood function์ ํ์ฉํด ์์ ๋ค์ ๊ธฐ์ ํ๋ฉด ์๋์ ๊ฐ๋ค.
\[\begin{aligned} p(c_k \mid a_1, \dots, a_n) &\propto p(c_k) \cdot p(a_1, \dots, a_n \mid c_k) \\ &= p(c_k) \cdot \prod^n_{i=1} p(a_n \mid c_k) \\ &= p(c_k) \cdot \prod^n_{i=1} \frac{1}{\sqrt{2\pi \sigma_{i, c_k}^2}} \cdot \exp \left( - \frac{(a_i - \mu_{c_k})}{2 \sigma_{i, c_k}^2} \right) \end{aligned}\]