๋ณธ ๊ธ€์€ 2018-2ํ•™๊ธฐ Stanford Univ.์˜ Andrew Ng ๊ต์ˆ˜๋‹˜์˜ Machine Learning(CS229) ์ˆ˜์—…์˜ ๋‚ด์šฉ์„ ์ •๋ฆฌํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ง€์ ์€ ์–ธ์ œ๋‚˜ ํ™˜์˜์ž…๋‹ˆ๋‹ค :)

10 minute read

๋ณธ ๊ธ€์€ 2018-2ํ•™๊ธฐ Stanford Univ.์˜ Andrew Ng ๊ต์ˆ˜๋‹˜์˜ Machine Learning(CS229) ์ˆ˜์—…์˜ ๋‚ด์šฉ์„ ์ •๋ฆฌํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ง€์ ์€ ์–ธ์ œ๋‚˜ ํ™˜์˜์ž…๋‹ˆ๋‹ค :)

โ€“ lecture 4

์ฃผ์˜!: ์ด๋ฒˆ ๊ธ€์€ ์ œ๊ฐ€ ์™„๋ฒฝํžˆ ์ดํ•ดํ•˜์ง€ ๋ชปํ•œ ์ฃผ์ œ๋ฅผ ๋‹ค๋ฃจ๊ณ  ์žˆ์–ด, ๋ถ€์กฑํ•œ ์ ์ด ๋งŽ์Šต๋‹ˆ๋‹ค.


์ด๋ฒˆ ๊ธ€์—์„œ๋Š” ์•ž์—์„œ ์‚ดํŽด๋ณธ Linear Regression, Logistic Regression ๋ชจ๋ธ์„ ์ „๋ถ€ ํฌ๊ด„ํ•˜๋Š” ์ผ๋ฐ˜ํ™”๋œ ํ˜•ํƒœ์˜ Linear Model์ธ GLMGeneralized Linear Model์„ ์‚ดํŽด๋ณธ๋‹ค.


(์‚ฌ์ „ ์ง€์‹) Bernoulli Distribution

์ด์‚ฐ ํ™•๋ฅ  ๋ถ„ํฌDiscrete Probability Distribution์˜ ์ผ์ข…์ด๋‹ค. Binary Classification์ด ๊ฐ–๋Š” ํ™•๋ฅ  ๋ถ„ํฌ์ด๋‹ค. ๋‹ค์Œ๊ณผ ๊ฐ™์€ ํ˜•ํƒœ๋ฅผ ๊ฐ€์ง€๋ฉฐ, ์ด๋•Œ $\phi$๋Š” โ€˜probability of eventโ€™์ด๋‹ค.

$$p(y; \phi) = \phi^{y} (1-\phi)^{(1-y)}$$

The Exponential Family

์„ธ์ƒ์—๋Š” ์…€์ˆ˜์—†์ด ๋งŽ์€countless Distribution์ด ์กด์žฌํ•  ๊ฒƒ์ด๋‹ค. ํ•˜์ง€๋งŒ Gaussian ๋ถ„ํฌ์™€ Bernoulli ๋ถ„ํฌ ๊ฐ™์ด ์ธ๊ฐ„์€ ๋ช‡๋ช‡ Distribution์„ ์ˆ˜์‹์˜ ํ˜•ํƒœ๋กœ ์ •ํ˜•ํ™”ํ•˜๊ณ  ๋ถ„์„ํ•˜์˜€๋‹ค.

๊ทธ๋Ÿฌ๋˜ ์ค‘ Distribution์—์„œ ๋ณด์ด๋Š” ์–ด๋–ค โ€˜ํŒจํ„ดโ€˜์„ ๋ฐœ๊ฒฌํ•˜๊ฒŒ ๋˜์—ˆ๊ณ , ๊ทธ ํŒจํ„ด์„ ๊ฐ€์ง€๋Š” Distribution์„ ๋ชจ์•„ Family๋ผ๊ณ  ๋ถˆ๋ฆฌ๋Š” ์ง‘ํ•ฉ์„ ์ œ์‹œํ•œ๋‹ค. ์ด๋ฒˆ์— ๋‹ค๋ฃจ๋Š” Exponential Family๋Š” ๊ทธ๋Ÿฐ ํŠน์ • ํŒจํ„ด์„ ๋ณด์ด๋Š” ํ™•๋ฅ  ๋ถ„ํฌ๋ฅผ ํฌ๊ด„ํ•˜๋Š” ๊ฒƒ์ด๋‹ค.

ํ™•๋ฅ  ๋ถ„ํฌ๊ฐ€ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๊ผด์„ ๊ฐ€์ง€๋ฉด, โ€œํ™•๋ฅ  ๋ถ„ํฌ๊ฐ€ Exponential Family์— ์†ํ•œ๋‹ค.โ€๊ณ  ๋งํ•œ๋‹ค.

$$p(y; \eta) = b(y) \exp{\left( \eta^{T} T(y) - a(\eta) \right)} $$

์ด๋•Œ ์œ„์˜ ํ˜•ํƒœ์— ๋“ฑ์žฅํ•˜๋Š” ๋ณ€์ˆ˜์™€ ํ•จ์ˆ˜๋“ค์€ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์ด๋ฆ„์„ ๊ฐ€์ง„๋‹ค. (์™ธ์šธ ์ •๋„๋กœ ์ค‘์š”ํ•˜์ง€๋Š” ์•Š๋‹ค.)

  • $y$: data
  • $\eta$: natural parameter (of distribution)
  • $T(y)$: sufficient statistic
  • $b(y)$: base measure
  • $a(\eta)$: log partition function

์ค€์‹์— ๋Œ€ํ•œ ์„ค๋ช…์„ ์ข€๋” ๋“ค์–ด๋ณด์ž.

$$p(y; \eta) = \frac{b(y) \exp{\left( \eta^{T} T(y) \right)}}{e^{a(\eta)}} $$
  • $\eta$๋Š” distribution์˜ parameter์ด๋‹ค. parameter of distribution
  • ์ผ๋ฐ˜์ ์œผ๋กœ $T(y)$๋Š” $y$๋กœ ์„ค์ •ํ•œ๋‹ค.
  • $\eta$๋Š” vector, $T(y)$๋Š” vector function์ธ ๋ฐ˜๋ฉด, $b(y)$์™€ $a(\eta)$๋Š” scalar function์ด๋‹ค.
  • $a(\eta)$๋Š” ๋ถ„ํฌ๋ฅผ normalizeํ•˜๋Š” ์—ญํ• ์„ ํ•œ๋‹ค. $a(\eta)$์„ ์ž˜ ์„ค์ •ํ•จ์œผ๋กœ์จ ํ™•๋ฅ ๋ถ„ํฌ์˜ ์ ๋ถ„/๋ง์…ˆ๊ฐ’์„ 1๋กœ ๋งŒ๋“ค ์ˆ˜ ์žˆ๋‹ค.

์ง€๊ธˆ๋ถ€ํ„ฐ๋Š” Bernoulli ๋ถ„ํฌ์™€ Gaussian ๋ถ„ํฌ๊ฐ€ Exponential Family์— ์†ํ•จ์„ ์‚ดํŽด๋ณผ ๊ฒƒ์ด๋‹ค!


Bernoulli Distribution โˆˆ Exponential Family

Bernoulli Distribution์€ binary data์— ๋Œ€ํ•œ ํ™•๋ฅ  ๋ถ„ํฌ์ด๋‹ค. $\phi$๋Š” probability of event๋กœ Bernoulli Distribution์€ ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

$$p(y; \phi) = \phi^{y}(1-\phi)^{(1-y)}$$

์šฐ๋ฆฌ๋Š” ์œ„์˜ Bernoulli Distribution ์‹์— Algebraic Massage1๋ฅผ ํ†ตํ•ด Bernoulli Distribution์ด Exponential Family์— ์†ํ•จ์„ ๋ณด์ผ ๊ฒƒ์ด๋‹ค!

Bernoulli Distribution ์‹์„ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋ณ€ํ˜•ํ•ด๋ณด์ž.

$$ \begin{split} p(y; \phi) &= \phi^{y}(1-\phi)^{(1-y)} \\ &= \exp{ \left[\log{ \left(\phi^{y}(1-\phi)^{(1-y)}\right) }\right] } \\ &= \exp{ \left[ \left( \log{ \left( \frac{\phi}{1-\phi} \right) } \right)y + \log{(1-\phi)} \right] } \end{split} $$

์œ„ ์‹์—์„œ $\eta$, $T(y)$, $a(\eta)$, $b(y)$๋ฅผ ์ฐพ์•„๋ณด๋ฉด

  • $\eta$: $\log{(\phi / (1-\phi))}$
  • $T(y)$: $y$
  • $a(\eta)$: $-\log{(1-\phi)}$
    • ์ด๋•Œ $\eta = \log{(\phi / (1-\phi))}$์ž„์„ ์ด์šฉํ•ด $\eta$์— ๋Œ€ํ•œ ์‹์œผ๋กœ ๋‹ค์‹œ ์“ฐ๋ฉด,
    • $\phi = 1/(1+e^{-\eta})$
    • $a(\eta) = \log{(1+e^{\eta})}$
  • $b(y)$: $1$

์ฆ‰, ๊ธฐ์กด์˜ Bernoulli Distribution์„ ์ ์ ˆํžˆ ๋ณ€ํ˜•ํ•ด์„œ $\eta$, $T(y)$, $a(\eta)$, $b(y)$๋ฅผ ์ž˜ ์„ค์ •ํ•ด์คŒ์œผ๋กœ์จ Bernoulli Distribution์ด Exponential Family์— ์†ํ•จ์„ ๋ณด์˜€๋‹ค!


Gaussian Distribution โˆˆ Exponential Family

์ด๋ฒˆ์—๋Š” Gaussian Distribution์ด Exponential Family์— ์†ํ•จ์„ ์‚ดํŽด๋ณด์ž. ์ด๋•Œ, Variance $\sigma^{2}$๋Š” ์–ด๋–ค ํ•จ์ˆ˜๊ฐ€ ์•„๋‹ˆ๋ผ ๊ณ ์ •๋˜์–ด ์žˆ๋‹ค. ์—ฌ๊ธฐ์—์„œ ์šฐ๋ฆฌ๋Š” $\sigma^{2}$๊ฐ€ $1$์ด๋ผ๊ณ  ๊ฐ€์ •ํ•œ๋‹ค.

Gaussian Distribution์˜ ์‹์€ ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

$$ \begin{split} p(y; \mu) &= \frac{1}{\sqrt{2\pi}\sigma} \exp{\left( - \frac{(y-\mu)^2}{2\sigma^{2}} \right)} \\ &= \frac{1}{\sqrt{2\pi}} \exp{\left( - \frac{(y-\mu)^2}{2} \right)} \end{split} $$

์ด์ œ ์œ„ ์‹์„ ์ ์ ˆํžˆ Algebraic Massage ํ•  ๊ฒƒ์ด๋‹ค.

$$ \begin{split} p(y; \mu) &= \frac{1}{\sqrt{2\pi}} \exp{\left( - \frac{(y-\mu)^2}{2} \right)} \\ &= \frac{1}{\sqrt{2\pi}} \exp{\left(-\frac{y^2}{2}\right)} \cdot \exp{\left(\mu y - \frac{1}{2} {\mu^{2}} \right)} \end{split} $$

์œ„ ์‹์—์„œ $\eta$, $T(y)$, $a(\eta)$, $b(y)$๋ฅผ ์ฐพ์•„๋ณด๋ฉด

  • $\eta$: $\mu$
  • $T(y)$: $y$
  • $a(\eta)$: $\mu^{2} / 2 = \eta^{2} / 2$
  • $b(y)$: $(1/\sqrt{2\pi})\exp{(-y^{2} / 2)}$

์ฆ‰, ๊ธฐ์กด์˜ Gaussian Distribution์„ ์ ์ ˆํžˆ ๋ณ€ํ˜•ํ•ด์„œ $\eta$, $T(y)$, $a(\eta)$, $b(y)$๋ฅผ ์ž˜ ์„ค์ •ํ•ด์คŒ์œผ๋กœ์จ Gaussian Distribution์ด Exponential Family์— ์†ํ•จ์„ ๋ณด์˜€๋‹ค!


์ง€๊ธˆ๊นŒ์ง€ ์‚ดํŽด๋ณธ Bernoulli Distribution, Gaussian Distribution ์™ธ์—๋„ ๋งŽ์€ ํ™•๋ฅ  ๋ถ„ํฌ๋“ค์ด Exponential Family์— ์†ํ•œ๋‹ค.2


Generalized Linear Model

์ง€๊ธˆ๋ถ€ํ„ฐ ์šฐ๋ฆฌ๋Š” ์ผ๋ฐ˜์ ์ธ ํ˜•ํƒœ์˜ Regression๊ณผ Classification ๋ฌธ์ œ๋ฅผ ์–ด๋–ป๊ฒŒ ๋ชจ๋ธ๋ง ํ•˜๋Š”์ง€์— ๋Œ€ํ•ด ๋‹ค๋ฃฐ ๊ฒƒ์ด๋‹ค. ์šฐ๋ฆฌ๋Š” ์ฃผ์–ด์ง„ ์ƒํ™ฉ์„ $x$๋กœ ๋‘๊ณ , random variable $y$๋ฅผ function of $x$๋กœ ๋‘˜ ๊ฒƒ์ด๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์šฐ๋ฆฌ๋Š” $x$์— ๋Œ€ํ•œ $y$์˜ ํ™•๋ฅ  ๋ถ„ํฌ๋ฅผ ์˜ˆ์ธกํ•  ๊ฒƒ์ด๋‹ค.

GLM์„ ๋ชจ๋ธ๋ง ํ•  ๋•Œ ์šฐ๋ฆฌ๋Š” ๋‹ค์Œ์˜ 3๊ฐ€์ง€๋ฅผ ๊ฐ€์ •ํ•œ๋‹ค.

  1. $y \vert x; \theta \sim \textrm{ExponentialFamily}(\eta)$
    ์ฆ‰, ์ฃผ์–ด์ง„ $x$, $\theta$์— ๋Œ€ํ•œ $y$์˜ ํ™•๋ฅ  ๋ถ„ํฌ๊ฐ€ $\eta$๋ฅผ ํŒŒ๋ผ๋ฏธํ„ฐ๋กœ ํ•˜๋Š” Exponential family์˜ ํ™•๋ฅ ๋ถ„ํฌ๋ฅผ ๋”ฐ๋ฅธ๋‹ค.
  2. natural parameter $\eta$์™€ input $x$๋Š” linearly related ๋˜์–ด ์žˆ๋‹ค: $\eta = \theta^{T} x$
  3. ์šฐ๋ฆฌ๋Š” ํ•™์Šต์„ ํ†ตํ•ด์„œ prediction $h(x)$๊ฐ€ $\textrm{E}[y \vert x; \theta]$๋ฅผ ๋งŒ์กฑํ•˜๋„๋ก ํ•  ๊ฒƒ์ด๋‹ค3: $h(x) = \textrm{E}[y \vert x; \theta]$

์œ„์˜ 3๊ฐ€์ง€ ๊ฐ€์ •๋“ค, ๋˜๋Š” Design choice๋ฅผ ํ†ตํ•ด์„œ ์šฐ๋ฆฌ๋Š” ํ›Œ๋ฅญํ•œ Generalized Linear Model์„ ์–ป๊ฒŒ ๋œ๋‹ค. GLM์ด ์ค‘์š”ํ•œ ์ด์œ ๋Š” GLM์ด Learning์—์„œ ์œ ์šฉํ•œ ์—ฌ๋Ÿฌ ์„ฑ์งˆ๋“ค์„ ๊ฐ€์ง€๊ณ  ์žˆ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค!


์ƒํ™ฉ์„ ๊ทธ๋ฆผ์œผ๋กœ ์ดํ•ดํ•ด๋ณด์ž.

  • ๊ฐ€์ •(2)์— ๋”ฐ๋ผ $\eta=\theta^{T}x$์ด๋ฏ€๋กœ $\eta$๋Š” Linear Model์˜ ์ถœ๋ ฅ๊ฐ’์ด๋‹ค.
  • ์ž…๋ ฅ ๋ฐ์ดํ„ฐ์™€ ๋ชฉ์ ์— ๋”ฐ๋ผ ์ ์ ˆํ•œ Distribution์„ ๋””์ž์ธ ํ•œ๋‹ค. = ์ ์ ˆํ•œ $a$, $b$, $T$๋ฅผ ์ •ํ•œ๋‹ค.

์ด๋ฒˆ์—” Training/Test Phase์—์„œ ๋ชจ๋ธ์ด ์–ด๋–ป๊ฒŒ ์ž‘๋™ํ•˜๋Š”์ง€๋ฅผ ์‚ดํŽด๋ณด์ž.

  • Training์—์„œ ์˜ํ–ฅ์„ ๋ฐ›๋Š” ๊ฒƒ์€ ์˜ค์ง Linear Model์ด๋‹ค. Distribution์€ ์ „ํ˜€ ์˜ํ–ฅ์ด ์—†์œผ๋ฉฐ, Learning์˜ ๋Œ€์ƒ์ด ์•„๋‹ˆ๋‹ค.
  • ์šฐ๋ฆฌ๋Š” Distribution์˜ ์ถœ๋ ฅ๊ฐ’์œผ๋กœ Test๋ฅผ ์ง„ํ–‰ํ•œ๋‹ค.
    • Distribution์˜ ์ถœ๋ ฅ์œผ๋กœ ํ‰๊ท ๊ฐ’์ธ $\textrm{E}[y \vert x ; \theta]$๋ฅผ ์–ป๊ณ , ๊ฐ€์ •(3)์— ๋”ฐ๋ผ ๊ทธ๊ฒƒ์€ $h_{\theta}(x)$์ด๋‹ค.
    • Distribution์˜ ์ถœ๋ ฅ์€ ์ •๋‹ต $y$์™€ ๋น„๊ต๋˜์–ด $\theta$ ๊ฐ’์„ ๊ฐฑ์‹ ํ•˜๋Š” ์ง€ํ‘œ๋กœ ์‚ฌ์šฉ๋œ๋‹ค.

Ordinary Least Squares

์ง€๊ธˆ๊นŒ์ง€ ์ œ์‹œํ•œ GLM์˜ ์›๋ฆฌ๋ฅผ GLM์˜ ํŠน์ˆ˜ํ•œ ๊ฒฝ์šฐ ์ค‘ ํ•˜๋‚˜์ธ Ordinary Least Squares๋ฅผ ์‚ดํŽด๋ด„์œผ๋กœ์จ ๊ณฑ์”น์–ด ๋ณด์ž.

์˜ˆ์ธกํ•˜๊ณ ์ž ํ•˜๋Š” target variable $y$(GLM์—์„œ๋Š” response variable์ด๋ผ๊ณ ๋„ ํ•จ.)๊ฐ€ ์—ฐ์†์ ์ด๊ณ , Gaussian $\mathcal{N}(\mu, \sigma^{2})$๋ฅผ ๋งŒ์กฑํ•œ๋‹ค๊ณ  ๊ฐ€์ •ํ•  ๊ฒƒ์ด๋‹ค.

Gaussian์€ Exponential Family์— ์†ํ•˜๋ฏ€๋กœ, Gaussian์˜ ํŒŒ๋ผ๋ฏธํ„ฐ $\mu$๋Š” Exponential Family์˜ $\eta$๊ฐ€ ๋œ๋‹ค. : $\mu = \eta$

๊ทธ๋ฆฌ๊ณ  Ordinary Least Squares์—์„œ ์„ค์ •ํ•œ hypothesis $h_{\theta}(x)$๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์œ ๋„๋œ๋‹ค.

$$ \begin{split} h_{\theta}(x) &= \textrm{E}[y \vert x ; \theta] \\ &= \mu \\ &= \eta \\ &= \theta^{T}x \end{split} $$

๊ฐ ๊ณผ์ •์— ๋Œ€ํ•œ ์ž์„ธํ•œ ์„ค๋ช…์„ ๋“ค์–ด๋ณด์ž.

  • $h_{\theta}(x) = \textrm{E}[y \vert x ; \theta]$๋Š” 3๋ฒˆ์งธ ๊ฐ€์ •์„ ํ†ตํ•ด ์ œ์‹œ๋œ๋‹ค.
  • $\textrm{E}[y \vert x ; \theta] = \mu$๋Š” $y \vert x ; \theta \sim \mathcal{N}(\mu, \sigma^{2})$์ž„์„ ํ†ตํ•ด ์ œ์‹œ๋œ๋‹ค.
  • $\mu = \eta$๋Š” Gaussian์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ Exponential Family์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋กœ ๋ณ€ํ™˜ํ•œ ๊ฒƒ์ด๋‹ค.
  • $\eta = \theta^{T}x$๋Š” 2๋ฒˆ์งธ ๊ฐ€์ •์„ ํ†ตํ•ด ์ œ์‹œ๋œ๋‹ค.

Relation btw three parameters

์œ„์—์„œ ๋‚˜์˜จ $\mu$๋ฅผ canonical parameter๋ผ๊ณ  ๋ถ€๋ฅธ๋‹ค. canonical parameter๋Š” Regression์˜ ๋ชฉ์ ์— ๋”ฐ๋ผ ์„ค๊ณ„ํ•œ Distribution์ด ๊ฐ–๋Š” ๋ณ€์ˆ˜์ด๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด Bernuolli Distribution์—์„œ๋Š” $\phi$๊ฐ€ canonical parameter์ด๋‹ค.

cononical parameter $\mu$์™€ natural parameter $\eta$์— ๋Œ€ํ•œ ๊ด€๊ณ„๋Š” canonical response function $g(\eta)$์œผ๋กœ ํ‘œํ˜„๋œ๋‹ค. ๊ทธ๋ฆฌ๊ณ  canonical function์˜ inverse๋Š” canonical link function $g^{-1}(\mu)$๋กœ ํ‘œํ˜„๋œ๋‹ค.

$$\mu = \textrm{E}[y; \eta] = g(\eta)$$ $$\eta = g^{-1}(\mu)$$

model parameter $\theta$, natural parameter $\eta$, canonical parameter $\mu$ or $\phi$, canonical function $g(\eta)$, $g^{-1}(\mu)$์— ๋Œ€ํ•œ ๊ด€๊ณ„๋ฅผ ๋‹ค์ด์–ด๊ทธ๋žจ์œผ๋กœ ํ‘œํ˜„ํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

Logistic Regression์„ ์˜ˆ๋กœ ๋“ค์–ด ์‚ดํŽด๋ณด์ž!

$$ \begin{split} h_{\theta}(x) &= \textrm{E}[y \vert x; \theta] \\ &= \phi = g(\eta) = \frac{1}{1+e^{-\eta}} = \frac{1}{1+e^{-{\theta^{T}x}}} \end{split} $$

๋†€๋ž๊ฒŒ๋„ ์ง€๊ธˆ๊นŒ์ง€ GLM์—์„œ ์‚ดํŽด๋ณธ ํ๋ฆ„์ด Logistic Regression์˜ ๊ฒฐ๊ณผ์— ๊ทธ๋Œ€๋กœ ๋…น์•„์žˆ์—ˆ๋‹ค.

๊ฒฐ๊ตญ sigmoid function์€ ๊ทธ๋ƒฅ ๋‚˜์˜จ ๊ฒƒ์ด ์•„๋‹ˆ๋ผ Bernoulli Distribution์„ GLM์œผ๋กœ ํ•ด์„ํ•˜์—ฌ ์œ ๋„ํ•œ ๊ฒฐ๊ณผ์ธ ๊ฒƒ์ด๋‹ค.


๋งบ์Œ๋ง

GLM์„ ์‚ดํŽด๋ด„์œผ๋กœ์จ ์ง€๊ธˆ๊นŒ์ง€ ํ–‰ํ•ด์™”๋˜ Linear Regression ๋ชจ๋ธ์˜ ํŒจ๋Ÿฌ๋‹ค์ž„์„ ์—ฟ๋ณผ ์ˆ˜ ์žˆ์—ˆ๋‹ค. Linear Regression๊ณผ Logistic Regression ๋กœ๋Š” ์„ธ์ƒ์˜ ๋ชจ๋“  ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•  ์ˆ˜ ์—†๋‹ค. ์•ž์œผ๋กœ ๋” ๋ณต์žกํ•˜๊ณ  ์ •๊ตํ•œ Regression์„ ํ•™์Šตํ•˜๊ณ  ์‚ฌ์šฉํ•˜๊ฒŒ ๋  ํ…๋ฐ, ๊ทธ๋•Œ์˜ Regression Model์ด GLM์˜ ํŒจ๋Ÿฌ๋‹ค์ž„์„ ๋ฐ”ํƒ•์œผ๋กœ ํ•˜๊ณ  ์žˆ์Œ์„ ์ธ์ง€ํ•œ๋‹ค๋ฉด ๋ณธ ๊ธ€์„ ์ž˜ ์ดํ•ดํ•œ ๊ฒƒ์ด๋‹ค.

GLM์„ ์š”์•ฝํ•ด๋ณด์ž.

  • ์šฐ๋ฆฌ๊ฐ€ Regression์—์„œ ์“ฐ๋Š” ๋Œ€๋ถ€๋ถ„์˜ ๋ชจ๋ธ์€ Exponential Family์— ์†ํ•œ๋‹ค.
  • GLM์€ Exponential Family์— ์†ํ•˜๋Š” ํ™•๋ฅ  ๋ถ„ํฌ๋ฅผ ๊ฐ–๋Š” ๋ชจ๋ธ์˜ ํŒจํ„ด์„ ์ •๋ฆฌํ•œ ๊ฒƒ์ด๋‹ค.
  • GLM์€ Linear Model๊ณผ ์šฐ๋ฆฌ๊ฐ€ ๋””์ž์ธํ•œ Distribution์ด ์–ด๋–ป๊ฒŒ ์—ฐ๊ฒฐ๋˜์–ด์•ผ ํ•˜๋Š”์ง€๋ฅผ ๋งํ•ด์ค€๋‹ค.
    • Logistic Regression์˜ ๊ฒฝ์šฐ, Bernoulli ๋ถ„ํฌ๋ฅผ Exponential Family์˜ ํ˜•ํƒœ๋กœ ๋ฐ”๊ฟˆ์œผ๋กœ์จ $\theta^{T}x$์™€ ํ™•๋ฅ  $\phi$๋ฅผ ์—ฐ๊ฒฐํ•˜๋Š” sigmoid ํ•จ์ˆ˜๋ฅผ ์ฐพ์„ ์ˆ˜ ์žˆ์—ˆ๋‹ค.



  1. lecture ๊ฐ•์—ฐ์ž๊ฐ€ ์‚ฌ์šฉํ•œ ๋ง์ธ๋ฐ, ์ •๋ง ๋งˆ์Œ์— ๋“ ๋‹ค!!! Algebraic Massage๋Š” ํ™•๋ฅ  ๋ถ„ํฌ ์‹์˜ ํ˜•ํƒœ๋ฅผ ๋ณ€ํ˜•ํ•œ๋‹ค๋Š” ๋ง์ด๋‹ค.ย 

  2. Poisson Distribution, Gamma Distribution, Drichlet Distribution ๋“ฑ๋“ฑโ€ฆย 

  3. ๋ณธ๋ž˜์˜ ๋ชฉํ‘œ์€ $T(y)$์˜ ํ‰๊ท expected value๋ฅผ ๊ตฌํ•˜๋Š” ๊ฒƒ์ด์ง€๋งŒ, ๋Œ€๋ถ€๋ถ„์˜ ๊ฒฝ์šฐ $T(y)=y$๋กœ ๋‘๊ธฐ ๋•Œ๋ฌธ์—, ์šฐ๋ฆฌ๋Š” $\textrm{E}[y \vert x; \theta]$๋ฅผ ๊ตฌํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœํ•˜๊ฒŒ ๋œ๋‹ค.ย