๋ณธ ๊ธ€์€ 2018-2ํ•™๊ธฐ Stanford Univ.์˜ Andrew Ng ๊ต์ˆ˜๋‹˜์˜ Machine Learning(CS229) ์ˆ˜์—…์˜ ๋‚ด์šฉ์„ ์ •๋ฆฌํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ง€์ ์€ ์–ธ์ œ๋‚˜ ํ™˜์˜์ž…๋‹ˆ๋‹ค :)

9 minute read

๋ณธ ๊ธ€์€ 2018-2ํ•™๊ธฐ Stanford Univ.์˜ Andrew Ng ๊ต์ˆ˜๋‹˜์˜ Machine Learning(CS229) ์ˆ˜์—…์˜ ๋‚ด์šฉ์„ ์ •๋ฆฌํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ง€์ ์€ ์–ธ์ œ๋‚˜ ํ™˜์˜์ž…๋‹ˆ๋‹ค :)

GDA(Gaussian Discriminant Analysis)๋ผ๋Š” ๊ธฐ๋ฒ•์ด ๋“ฑ์žฅํ•œ๋‹ค. ์ด๋ฆ„์ด ํ›„๋œ๋œ ํ•˜๊ฒŒ ์ƒ๊ฒผ์ง€๋งŒ, ์ด๋ก ์€ ๋ณ„๊ฑฐ ์—†๋‹ค. ์•ˆ์‹ฌํ•˜๊ณ  ๋‹ค์ด๋ธŒ๐Ÿคฟํ•˜์ž!

โ€“ lecture 5


Generative Learning Algorithm

์šฐ๋ฆฌ๊ฐ€ ์‚ดํŽด๋ณธ Logistic Regression ๋ชจ๋ธ๋“ค์€ Discriminative Learning์— ์†ํ•˜๋Š” ๋ชจ๋ธ์ด์—ˆ๋‹ค.

Discriminative ๋ชจ๋ธ์—์„œ๋Š” $p(y \vert x)$๋ฅผ ํ•™์Šตํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•œ๋‹ค. ๋ฐ˜๋ฉด์— Generative ๋ชจ๋ธ์€ $p(x \vert y)$์™€ $p(y)$๋ฅผ ํ•™์Šตํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•œ๋‹ค.1

Generative Learning์€ Bayes Rule์„ ๋ฐ”ํƒ•์œผ๋กœ ํ•˜๋Š” ์ด๋ก ์ด๋‹ค.

$$p(y \vert x) = \frac{p(x \vert y) p(y)}{p(x)}$$

์šฐ๋ฆฌ๊ฐ€ ๋ชฉํ‘œ๋กœ ํ•˜๋Š” ๊ฒƒ์€ ์—ฌ์ „ํžˆ $p(y \vert x)$์ด๋‹ค. Discriminative ๋ชจ๋ธ์€ $p(y \vert x)$๋ฅผ ํ•™์Šตํ•˜๋Š” ๋ฐ˜๋ฉด, Generative ๋ชจ๋ธ์€ $p(x \vert y)$์™€ $p(y)$๋ฅผ ์ •์˜ํ•˜๊ณ  ํ•™์Šตํ•˜์—ฌ $p(y \vert x)$์˜ ๊ฐ’์„ ๊ฐ„์ ‘์ ์œผ๋กœ ์œ ๋„ํ•œ๋‹ค. ๊ตฌ์ฒด์ ์œผ๋กœ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

$$ \begin{aligned} \arg{ \max_{y} {p(y \vert x)}} &= \arg{ \max_{y} {\frac{p(x \vert y)p(y)}{p(x)}}} \\ &= \arg{ \max_{y} {p(x \vert y)p(y)}} \end{aligned} $$

๊ฒฐ๊ตญ Discriminative๋‚˜ Generative๋‚˜ ํฐ ํ๋ฆ„์€ ๋™์ผํ•˜์ง€๋งŒ, ๊ตฌํ•˜๋Š” ๊ณผ์ •์ด direct์ด๋‚˜ indirect์ด๋‚˜์˜ ์ฐจ์ด์ผ ๋ฟ์ด๋‹ค.

(์‚ฌ์ „์ง€์‹) Bayes Rule

$$p(y \vert x) = \frac{p(x \vert y) p(y)}{p(x)}$$

Bayes Rule์˜ ์šฉ์–ด๋ฅผ ์ •๋ฆฌํ•ด๋ณด์ž.

  • $p(y \vert x)$: posterior probability
    • ๋ฐ์ดํ„ฐ X์— ๋Œ€ํ•œ ๋ ˆ์ด๋ธ” Y์˜ ํ™•๋ฅ ์ด๋‹ค.
    • Classification์˜ ๊ธฐ์ค€์ด ๋œ๋‹ค.
  • $p(y)$: prior probability
    • ์ •๋‹ต ๋ ˆ์ด๋ธ”์˜ ๋ถ„ํฌ๋ฅผ ํ†ตํ•ด ์–ป๋Š”๋‹ค.
    • ๋ ˆ์ด๋ธ” y์˜ ์ˆ˜ / ์ „์ฒด ๋ฐ์ดํ„ฐ ์ˆ˜
  • $p(x \vert y)$: likelihood
    • ๋ ˆ์ด๋ธ” Y๋ฅผ ๊ฐ–๋Š” ๋ฐ์ดํ„ฐ์˜ ๋ถ„ํฌ๋ฅผ ์˜๋ฏธํ•œ๋‹ค.
    • Generative Model์€ ์ด ํ™•๋ฅ ์„ ๋ชจ๋ธ๋งํ•˜๊ณ  ๋˜ ํ•™์Šตํ•œ๋‹ค.
  • $p(x)$
    • ๋ณดํ†ต ๊ฐ’์„ ๊ตฌํ•  ์ˆ˜๋„ ์—†๊ณ , ๊ตฌํ•  ํ•„์š”๋„ ์—†๋‹ค.
    • ๊ทธ๋ž˜์„œ ์ž˜ ์‹ ๊ฒฝ ์“ฐ์ง€ ์•Š๋Š”๋‹ค.


๋ถ„๋ฅ˜ ๋ฌธ์ œ๋ฅผ ๊ธฐ์ค€์œผ๋กœ ๋งํ•˜์ž๋ฉด, Logistic Regression์ด Discriminative Model์— ์†ํ•œ๋‹ค. Naive Bayes Classifier๋Š” Generative Model์— ์†ํ•œ๋‹ค. ๋˜ ์•„๋ž˜์— ์–ธ๊ธ‰๋˜๋Š” GDA(Gaussian Discriminant Analysis)๋„ Generative Model์ด๋‹ค.


Gaussian Discriminant AnalysisGDA

GDA๋Š” ์ด๋ฆ„์— โ€˜Discriminantโ€™๊ฐ€ ๋“ค์–ด๊ฐ€์ง€๋งŒ, Generative Model์ด๋‹ค. GDA์—์„œ๋Š” $p(x \vert y)$๊ฐ€ multivariate normal distribution์„ ๋งŒ์กฑํ•œ๋‹ค๊ณ  โ€˜๊ฐ€์ •โ€™ํ•œ๋‹ค. GDA์— ๋Œ€ํ•ด ๋ณธ๊ฒฉ์ ์œผ๋กœ ๋‹ค๋ฃจ๊ธฐ ์ „์— multivariate normal distribution์„ ๊ฐ€๋ณ๊ฒŒ ์‚ดํŽด๋ณด์ž.

(์‚ฌ์ „์ง€์‹) Multi-variate normal distribution

๋ฐ”ํƒ•์ด ๋˜๋Š” uni-variate Gaussian ๋ถ„ํฌ๋ฅผ ๋จผ์ € ์‚ดํŽด๋ณด์ž.

$$\mathcal{N}(x; \mu, \sigma) = \frac{1}{\sqrt{2\pi}\sigma}\exp{\left[ -\frac{(x-\mu)^2}{2\sigma^2}\right]}$$
  • $E[x]=\mu$
  • $\textrm{Cov}(x) = E[(x-\mu)^2]$

์ด์ œ multivariate Gaussian ๋ถ„ํฌ์˜ ๊ฒฝ์šฐ๋ฅผ ์‚ดํŽด๋ณด์ž. multivariate Gaussian์˜ ๊ฒฝ์šฐ ํ‰๊ท ์€ mean vector $\mu \in \mathbb{R}^n$๋กœ, ๋ถ„์‚ฐ์€ ๊ณต๋ถ„์‚ฐCovariance๋ผ๋Š” ์ด๋ฆ„์œผ๋กœ covariance matrix $\Sigma \in \mathbb{R}^{n \times n}$์œผ๋กœ ํ‘œํ˜„๋œ๋‹ค.

$$\mathcal{N}(X; \mu, \Sigma) = \frac{1}{\sqrt{2\pi}{\lvert \Sigma \rvert}^{1/2}} \exp{\left[ -\frac{1}{2}(X - \mu)^{T}\Sigma^{-1}(X-\mu) \right]}$$

์ด๋•Œ, $\lvert \Sigma \rvert$๋Š” Covariance Matrix $\Sigma$์˜ determinant ๊ฐ’์ด๋‹ค. ๊ทธ๋ฆฌ๊ณ  ํ‰๊ท ๊ณผ ๊ณต๋ถ„์‚ฐ์— ๋Œ€ํ•œ ์‹์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

  • $E[X] = \int_{x}{x p(x; \mu, \Sigma) dx}$
  • $\textrm{Cov}(X) = E[(X-E[X])(X-E[X])^{T}]$
  • $\mu$๊ฐ€ zero-vector(=zero mean)์ด๊ณ  $\Sigma = I$(=identity covariance)์ธ ๊ฒฝ์šฐ๋ฅผ standard normal distribution์ด๋ผ๊ณ  ํ•œ๋‹ค.

GDA Modeling

binary classification ๋ฌธ์ œ๋ฅผ GDA๋กœ ๋ชจ๋ธ๋ง ํ•˜๊ธฐ ์œ„ํ•ด ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๊ฐ€์ •์„ ํ•œ๋‹ค.

  • $y \sim \textrm{Bernoulli}(\phi)$
    • ์ด ๋ถ€๋ถ„์€ ๊ฐ€์ •์ด ์•„๋‹ˆ๋‹ค. ์ด์ง„ ๋ถ„๋ฅ˜ ๋ฌธ์ œ๋ผ์„œ $y$๋Š” ๋ฒ ๋ฅด๋ˆ„์ด ๋ถ„ํฌ์ผ ์ˆ˜ ๋ฐ–์— ์—†๋‹ค.
    • $\phi = 0.5$๋ผ๋ฉด uniform distribution์ด ๋  ๊ฒƒ์ด๋‹ค.
    • ์ฐธ๊ณ ๋กœ $y$์— ๋Œ€ํ•œ ๋ถ„ํฌ๋Š” ์–ด๋–ค ๋ฌธ์ œ๋ฅผ ํ‘ธ๋Š”์ง€์— ๋”ฐ๋ผ ์ž๋™์œผ๋กœ ๊ฒฐ์ •๋˜๊ธฐ ๋•Œ๋ฌธ์— ๊ฐ€์ •์„ ๋„์ž…ํ•˜๋Š” ๋ถ€๋ถ„์ด ์•„๋‹ˆ๋‹ค.
  • $x \vert y = 0 \sim \mathcal{N}(\mu_0, \Sigma)$
  • $x \vert y = 1 \sim \mathcal{N}(\mu_1, \Sigma)$


๋ถ„ํฌ๋ฅผ ์‹์œผ๋กœ ๊ธฐ์ˆ ํ•˜๋ฉด ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

  • $p(y) = \phi^y (1-\phi)^{(1-y)}$
  • $p(x \vert y=0) = \frac{1}{\sqrt{2\pi}{\lvert \Sigma \rvert}^{1/2}}\exp{\left[ -\frac{1}{2}(x - \mu_0)^{T}\Sigma^{-1}(x-\mu_0) \right]}$
  • $p(x \vert y=1) = \frac{1}{\sqrt{2\pi}{\lvert \Sigma \rvert}^{1/2}}\exp{\left[ -\frac{1}{2}(x - \mu_1)^{T}\Sigma^{-1}(x-\mu_1) \right]}$

์šฐ๋ฆฌ์˜ GDA ๋ชจ๋ธ์˜ parameter๋ฅผ ์ •๋ฆฌํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

  • $\phi \in \mathbb{R}$
  • $\mu_0, \mu_1 \in \mathbb{R}^n$
  • $\Sigma \in \mathbb{R}^{n \times n}$ 2

์šฐ๋ฆฌ๋Š” ์œ„์˜ $\phi$, $\mu_0$, $\mu_1$, $\Sigma$๋ฅผ ํ•™์Šต์‹œํ‚ฌ ๊ฒƒ์ด๋‹ค!


Joint Likelihood $L(\phi, \mu_0, \mu_1, \Sigma)$๋ฅผ ์ •์˜ํ•ด๋ณด์ž.

$$ \begin{aligned} L(\phi, \mu_0, \mu_1, \Sigma) &= \prod_{i=1}^{m}{p(x^{(i)}, y^{(i)}; \phi, \mu_0, \mu_1, \Sigma)} \\ &= \prod_{i=1}^{m}{ p(x^{(i)} \vert y^{(i)})p(y^{(i)}) } \end{aligned} $$

์ž ์‹œ Discriminant Learning์—์„œ์˜ Conditional Likelihood์™€ ๋น„๊ตํ•ด๋ณด์ž.

$$L(\theta) = \prod_{i=1}^{m}{p(y^{(i)} \vert x^{(i)}; \theta)}$$

parameter์˜ ์ธก๋ฉด์—์„œ $\theta$์™€ $\phi$, $\mu_0$, $\mu_1$, $\Sigma$๋กœ ์ฐจ์ด๊ฐ€ ์žˆ๊ณ , Maximize ๋Œ€์ƒ๋„ Discriminant Learning์˜ ๊ฒฝ์šฐ $p(y \vert x)$๋ฅผ Maximizeํ•˜๋Š” ๋ฐ˜๋ฉด Generative Learning์€ $p(x \vert y)p(y)$๋ฅผ Maximizeํ•˜๊ณ  ์žˆ๋‹ค.

MLE on GDA

์ •์˜ํ•œ $L(\phi, \mu_0, \mu_1, \Sigma)$๋ฅผ Maximize ํ•˜์ž. ์ด๋•Œ, $L(\phi, \mu_0, \mu_1, \Sigma)$์— $\log$๋ฅผ ์ทจํ•œ $l(\phi, \mu_0, \mu_1, \Sigma)$๋ฅผ ๋Œ€์‹  Maximizeํ•œ๋‹ค.

$$\max_{\{ \phi, \mu_0, \mu_1, \Sigma \}} {\left[ l(\phi, \mu_0, \mu_1, \Sigma) \right]}$$

$l$์„ Maximizing ํ•˜๋Š” parameter์˜ ๊ฐ’์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค. ๊ฐ•์˜์—์„œ๋„ ์œ ๋„ ๊ณผ์ •์€ ์ƒ๋žตํ•˜์˜€๋‹ค. (์•„๋งˆ parameter ํ•˜๋‚˜ ์žก๊ณ  ๋ฏธ๋ถ„ํ•ด์„œ ์œ ๋„ํ•  ๋“ฏ?)

  • $\phi = \frac{\sum_{i=1}^{m} {y^{(i)}}}{m} = \frac{\sum_{i=1}^{m} {1\{y^{(i)}=1\}}}{m}$


  • $\mu_0 = \frac{\sum_{i=1}^{m} { 1\{y^{(i)}=0\} x^{(i)} }}{\sum_{i=1}^{m} {1\{y^{(i)}=0\}}}$
  • $\mu_1 = \frac{\sum_{i=1}^{m} { 1\{y^{(i)}=1\} x^{(i)} }}{\sum_{i=1}^{m} {1\{y^{(i)}=1\}}}$
  • $\Sigma = \frac{\sum_{i=1}^{m} {(x^{(i)} - \mu_{y^{(i)}})(x^{(i)} - \mu_{y^{(i)}})^{T}}}{m}$

$\mu_0$์„ ์ž˜ ์‚ดํŽด๋ณด์ž. $\mu_0$์˜ ๊ฒฐ๊ณผ๋ฅผ ๋ง๋กœ ํ’€์–ด์“ฐ๋ฉด, โ€œ$y=0$์ธ feacture vector๋“ค์˜ ํ•ฉ์„ $y=0$์˜ ์ˆ˜๋กœ ๋‚˜๋ˆˆ ๊ฒƒโ€ ์ฆ‰, ํ‰๊ท ์ด๋‹ค!! ์ด ๊ฒฐ๊ณผ๋Š” $\mu_0$๊ฐ€ $y=0$์ธ ์ •๋‹ต์— ๋Œ€ํ•œ ํ‰๊ท ์ด๋ผ๋Š” ์ •์˜์™€๋„ ์˜๋ฏธ๊ฐ€ ํ†ตํ•œ๋‹ค.

์ด ๊ฒฐ๊ณผ๋ฅผ ๊ทธ๋ž˜ํ”„๋กœ ํ‘œํ˜„ํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

์œ„ ๊ทธ๋ฆผ์— ๊ทธ๋ ค์ง„ ์ง์„ ์€ $p(y=1 \vert x)=0.5$๊ฐ€ ๋˜๋Š” decision boundary์˜ ์—ญํ• ์„ ํ•œ๋‹ค!!

MLE์˜ ๊ฒฐ๊ณผ๋ฅผ ํ†ตํ•ด ์šฐ๋ฆฌ๋Š” $\phi$, $\mu_0$, $\mu_1$, $\Sigma$์˜ ์ •ํ™•ํ•œ ๊ฐ’์„ ์–ป๊ฒŒ ๋˜์—ˆ๋‹ค. ์ด parameter๋“ค์„ ํ™œ์šฉํ•ด prediction ํ•  ์ˆ˜ ์žˆ๋‹ค.

$$\arg{ \max_{y} {p(y \vert x)}} = \arg{ \max_{y} {p(x \vert y)p(y)}}$$

GDA vs. Logistic Regression

๊ณ ์ •๋œ $\phi$, $\mu_0$, $\mu_1$, $\Sigma$์— ๋Œ€ํ•ด $p(y=1 \vert \phi, \mu_0, \mu_1, \Sigma)$๋ฅผ $x$์˜ ํ•จ์ˆ˜๋กœ ๊ทธ๋ ค๋ณด์ž.

๊ทธ๋Ÿฌ๋ฉด,

์ฆ‰, $p(y=1 \vert \phi, \mu_0, \mu_1, \Sigma)$๋Š” sigmoid์˜ shape์ด ๋‚˜์˜จ๋‹ค!!


์œ„์˜ ์‚ฌ์‹ค์€ GDA์™€ Logistic Regression์ด ๋ณธ์งˆ์ ์œผ๋กœ ๋™์ผํ•˜๋‹ค๋Š” ๊ฒƒ์„ ๋งํ•œ๋‹ค. ๊ทธ๋ ‡๋‹ค๋ฉด ์šฐ๋ฆฌ๋Š” ์–ธ์ œ GDA๋ฅผ ์“ฐ๊ณ , ์–ธ์ œ Logistic Regression์„ ์จ์•ผ ํ• ๊นŒ??

GDA์—์„œ ํ•˜๋Š” ๊ฐ€์ •๋“ค์€ Logistic Regression์—์„œ ํ•˜๋Š” hypothesis $h_{\theta}(x)$์˜ sigmoid ๊ฐ€์ •๋ณด๋‹ค ๋” ๊ฐ•๋ ฅํ•˜๋‹ค. ๊ทธ๋ž˜์„œ GDA๋Š” Logistic Regression์„ ์•”์‹œ(imply)ํ•œ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๋ฐ˜๋Œ€ ๋ฐฉํ–ฅ์€ ๋ถˆ๊ฐ€๋Šฅํ•˜๋‹ค. ์ฆ‰, $p(y \vert x)$์ด sigmoid๋ผ๊ณ  ํ•ด์„œ $p(x \vert y)$๊ฐ€ multivariate normal distribution์ธ ๊ฒƒ์€ ์•„๋‹ˆ๋‹ค.

GDA๋Š” ๋” ๊ฐ•๋ ฅํ•œ ๊ฐ€์ •์„ ์‚ฌ์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ์— ํ˜„์‹ค์˜ dataset์ด ๊ทธ ๊ฐ€์ •์„ ๋งŒ์กฑํ•˜์ง€ ์•Š๋Š”๋‹ค๋ฉด, ์•ˆ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์ธ๋‹ค. ๊ทธ๋ž˜์„œ dataset์˜ ๋ถ„ํฌ๋ฅผ ์ •ํ™•ํžˆ ์•Œ๊ณ  ์žˆ๋‹ค๋ฉด, GDA๋กœ ์ ‘๊ทผํ•  ์ˆ˜ ์žˆ์ง€๋งŒ ๊ทธ๋ ‡์ง€ ์•Š๋‹ค๋ฉด Logistic Regression์˜ ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•˜๊ธธ ๊ถŒ์žฅํ•œ๋‹ค. Logistic Regression์€ ๋” ์ ์€ ๊ฐ€์ •์„ ์ฑ„์šฉํ•˜๋Š” ๋Œ€์‹ ์— ๋” robust ํ•˜๊ณ  ์ž˜๋ชป๋œ ๋ชจ๋ธ๋ง์— ๋œ ๋ฏผ๊ฐํ•˜๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

์ผ๋ฐ˜์ ์œผ๋กœ Logistic Regression๊ณผ ๋น„๊ตํ–ˆ์„ ๋•Œ, GDA๋Š” small dataset์—์„œ ๋” ์ข‹์€ ๊ฒฐ๊ณผ๋ฅผ ๋‚ธ๋‹ค๊ณ  ํ•œ๋‹ค. ๋ฐ˜๋ฉด, huge dataset์—์„œ๋Š” Logistic Regression์ด ๋” ์ข‹์€ ๊ฒฐ๊ณผ๋ฅผ ๋‚ธ๋‹ค. ๋‹จ, ๋งŒ์•ฝ $p(x \vert y)$์— ๋Œ€ํ•œ GDA์˜ ๊ฐ€์ •์ด ์˜ณ๋‹ค๋ฉด, huge dataset์—์„œ๋„ GDA๊ฐ€ ๋” ์ข‹์€ ๊ฒฐ๊ณผ๋ฅผ ๋‚ธ๋‹ค.

์š”์ฆ˜์€ CIFAR, ImageNet๊ณผ ๊ฐ™์€ huge dataset์ด ์ž˜ ๊ตฌ์ถ•๋˜์–ด ์žˆ์–ด, Logistic Regression์ด ๋” ๊ฐ•์„ธ๋ฅผ ๋ณด์ธ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ dataset์ด ์ž˜ ๊ตฌ์ถ•๋˜์ง€ ์•Š์•˜๊ฑฐ๋‚˜ dataset์˜ ํฌ๊ธฐ๋ฅผ 100๊ฐœ๋กœ ์ œํ•œํ•œ ์ƒํ™ฉ์ด๋ผ๋ฉด, GDA๊ฐ€ ๋” ์ข‹์€ ๊ฒฐ๊ณผ๋ฅผ ๋‚ผ ์ˆ˜ ์žˆ๋‹ค.


  1. ์‹ค์ œ๋กœ๋Š” $p(x \vert y)$๋งŒ ํ•™์Šตํ•˜๊ณ  $p(y)$๋Š” ํ•™์Šตํ•˜์ง€ ์•Š๋Š”๋‹ค.ย 

  2. mean vector๋Š” $\mu_0$, $\mu_1$์œผ๋กœ ๋‘ ๊ฐœ์ธ ๋ฐ˜๋ฉด Covariance matrix $\Sigma$๋กœ ํ•˜๋‚˜์ด๋‹ค. ์ด๊ฒƒ ์—ญ์‹œ GDA๋ฅผ ๋ชจ๋ธ๋ง ํ•˜๋Š” ๊ณผ์ •์—์„œ ๋„์ž…ํ•œ ๊ฐ€์ • ์ค‘ ํ•˜๋‚˜์ด๋‹ค. ์ผ์ข…์˜ design choice! ์›ํ•œ๋‹ค๋ฉด $\Sigma_1$, $\Sigma_2$๋กœ ๋ถ„๋ฆฌํ•  ์ˆ˜๋„ ์žˆ๋‹ค.ย