๋ณธ ๊ธ€์€ 2018-2ํ•™๊ธฐ Stanford Univ.์˜ Andrew Ng ๊ต์ˆ˜๋‹˜์˜ Machine Learning(CS229) ์ˆ˜์—…์˜ ๋‚ด์šฉ์„ ์ •๋ฆฌํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ง€์ ์€ ์–ธ์ œ๋‚˜ ํ™˜์˜์ž…๋‹ˆ๋‹ค :)

3 minute read

๋ณธ ๊ธ€์€ 2018-2ํ•™๊ธฐ Stanford Univ.์˜ Andrew Ng ๊ต์ˆ˜๋‹˜์˜ Machine Learning(CS229) ์ˆ˜์—…์˜ ๋‚ด์šฉ์„ ์ •๋ฆฌํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ง€์ ์€ ์–ธ์ œ๋‚˜ ํ™˜์˜์ž…๋‹ˆ๋‹ค :)

โ€“ lecture 4


Decision Boundary

Linear Classification์€ Feacture Space ์œ„์˜ ๋‘ Class๋ฅผ ๋‚˜๋ˆ„๋Š” Decision Boundary๋ฅผ ๋งŒ๋“ ๋‹ค. ์ด๋ฒˆ์—๋Š” ์ด Decision Boundary์— ๋Œ€ํ•œ ์ด์•ผ๊ธฐ๋ฅผ ํ’€์–ด๋‚˜๊ฐ€๊ณ ์ž ํ•œ๋‹ค.


Feacture Space & Decision Boundary(=Hyperplain)

๋จผ์ € Feacture Space์— ๋Œ€ํ•ด ์ •์˜ํ•ด๋ณด์ž. ์ด๊ฒƒ์€ $x$๊ฐ€ ์กด์žฌํ•˜๋Š” ๊ณต๊ฐ„์ด๋‹ค. ๋งŒ์•ฝ $x \in \mathbb{R}^{n}$์ด๋ผ๋ฉด, Feacture Space๋Š” $\mathbb{R}^{n}$์˜ ๊ณต๊ฐ„์ด ๋˜๋ฉด, $x$๋Š” Feacture Space ์ƒ์˜ ํ•œ ์ ์ด ๋œ๋‹ค.

์•„๋ž˜์™€ ๊ฐ™์€ ๊ทธ๋ฆผ์„ ์ƒ์ƒํ•˜๋ฉด ๋œ๋‹ค. 1

์ด๋•Œ, Decision Boundary๋Š” Feacture Space ์ƒ์˜ ๋‘ Class๋ฅผ ๋‚˜๋ˆ„๋Š” Hyperplain์ด๋‹ค. 2

๊ณต๊ฐ„ ์ƒ์—์„œ ํ‰๋ฉด์„ ์–ด๋–ป๊ฒŒ ์ •์˜ํ•˜๋Š”์ง€ ๊ณฑ์”น์–ด ๋ณด์ž.

2๊ฐ€์ง€ ์š”์†Œ๊ฐ€ ํ•„์š”ํ•œ๋ฐ,

  1. ํ‰๋ฉด์ด ์ง€๋‚˜๋Š” ํ•œ ์  $P_0$
  2. ๊ทธ ์ ์„ ์ง€๋‚˜๋Š” Normal vector $\vec{w}$

๊ฐ€ ํ•„์š”ํ•˜๋‹ค.

ํ‰๋ฉด์— ๋Œ€ํ•œ ์‹์€ $\vec{w} \cdot x + b = 0$์œผ๋กœ ํ‘œํ˜„๋œ๋‹ค.

๊ทธ๋ž˜์„œ ์šฐ๋ฆฌ๊ฐ€ ์ฐพ๊ณ ์ž ํ•˜๋Š” Hyperplain Boundary $\vec{w} \cdot x + b = 0$๋ฅผ ์–ป๊ธฐ ์œ„ํ•ด์„ , ๋‘ Class๋ฅผ ์ž˜ ๋‚˜๋ˆ„๋Š” ์ ์ ˆํ•œ $\vec{w}$์™€ $b$๋ฅผ ์ฐพ์•„์•ผ ํ•œ๋‹ค.

Linear Classification

Linear Classification์€ $\theta^{T}x$๋ฅผ ํ†ตํ•ด์„œ ์ž…๋ ฅ $x$์™€ parameter $\theta$๋ฅผ ์—ฐ๊ด€์ง“๋Š”๋‹ค. ์ด๋•Œ $\vec{w} \cdot x + b$๋Š” $\theta^{T}x$์˜ ๋‹ค๋ฅธ ํ˜•ํƒœ๋กœ ๊ธฐ์ˆ ํ•œ ๊ฒƒ์ด๋‹ค.

์•ž์„  ํŒŒํŠธ์—์„œ ์šฐ๋ฆฌ๋Š” Logistic Regression์„ ์‚ดํŽด๋ณด์•˜๊ณ , hypothesis๋กœ $h_{\theta}(x) = \frac{1}{1 + e^{-\theta^{T}x}}$๋ฅผ ์‚ฌ์šฉํ•˜์˜€๋‹ค. sigmoid function ์ž์ฒด๋Š” non-linear ํ•จ์ˆ˜์ด๋‹ค. ํ•˜์ง€๋งŒ, $\theta$์™€ $x$๊ฐ€ $\theta^{T}x$๋ผ๋Š” Linearํ•œ ๋ฐฉ์‹์œผ๋กœ ์—ฐ๊ฒฐ๋˜์–ด ์žˆ๊ธฐ ๋•Œ๋ฌธ์— Logistic Regression๋„ ๊ฒฐ๊ตญ์€ Linear Classification์— ์†ํ•œ๋‹ค.

์ฐธ๊ณ ๋กœ non-Linear Classifier๋Š” $\theta^{T}x$ ๋Œ€์‹  $x^2_j$๋‚˜ ${x_i}{x_j}$๋ฅผ ์‚ฌ์šฉํ•ด Classification์„ ์ง„ํ–‰ํ•œ๋‹ค.

Interpretation of Decision Boundary with Learning

์•ž์˜ ๋ฌธ๋‹จ์—์„œ Linear Classification์€ Feacture Space๋ฅผ ๋ถ„ํ• ํ•˜๋Š” Hyperplain์„ ์ฐพ๋Š” ๊ฒƒ์ž„์„ ์‚ดํŽด๋ณด์•˜๋‹ค. ์ด๋ฒˆ์—๋Š” Hyperplain๊ณผ Learning์˜ ๊ด€๊ณ„์— ๋Œ€ํ•ด ์‚ดํŽด๋ณด๊ณ ์ž ํ•œ๋‹ค.

๋จผ์ € $\theta$๋Š” Hyperplain์˜ Normal vector์ด๋‹ค. ๊ทธ๋ฆฌ๊ณ  Hyerplain์„ ๊ธฐ์ค€์œผ๋กœ $\theta^{T}x > 0$์ด๋ฉด ์•…์„ฑ ์ข…์–‘, $\theta^{T}x < 0$์ด๋ฉด ์Œ์„ฑ ์ข…์–‘์œผ๋กœ ํ•ด์„ํ•œ๋‹ค๊ณ  ํ•ด๋ณด์ž.

๊ทธ๋Ÿฌ๋ฉด, Hyperplain์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

ํ•˜์ง€๋งŒ, ์ด ๋ชจ๋ธ์€ ํ•˜๋‚˜์˜ ์›(โ—‹)์„ ๋†“์น˜๊ณ  ์žˆ๋‹ค. ์ด ํŠน์ •ํ•œ ์ž…๋ ฅ $x_j$์— ๋Œ€ํ•ด $\theta$์™€์˜ ๋‚ด์ ๊ฐ’์„ ํ™•์ธํ•ด๋ณด๋ฉด,

$\theta^{T}{x_j} < 0$์ด๋ผ๋Š” ๊ฒฐ๊ณผ๊ฐ€ ๋‚˜์˜จ๋‹ค. ํ•˜์ง€๋งŒ ์ด๊ฒƒ์€ ๋ณธ๋ž˜ $x_j$๊ฐ€ ๊ฐ€์กŒ์–ด์•ผ ํ•  $y_j=1$์ด๋ผ๋Š” ๊ฒฐ๊ณผ์™€๋Š” ํฌ๊ฒŒ ๋‹ค๋ฅด๋‹ค! ๊ทธ๋ž˜์„œ Learning rule์„ ์ด์šฉํ•ด $\theta$์˜ ๊ฐ’์„ ๊ฐฑ์‹ ํ•ด์ค˜์•ผ ํ•œ๋‹ค.

$\theta$๋ฅผ $\thetaโ€™$์œผ๋กœ ๊ฐฑ์‹ ํ•œ ๊ฒฐ๊ณผ, ์ƒˆ๋กœ์šด Hyperplain์ด ์ •์˜๋˜์—ˆ๋‹ค. ์ด Hyperplain์€ ๋ชจ๋“  ์›(โ—‹)๋ฅผ $(\thetaโ€™)^T x > 0$๋กœ ๋ถ„๋ฅ˜ํ•˜๊ณ  ์žˆ๋‹ค.


๋งบ์Œ๋ง

  • Linear Classification์€ Feacture Space๋ฅผ Hyperplain์œผ๋กœ ๋‚˜๋ˆˆ๋‹ค.
    • ์šฐ๋ฆฌ๊ฐ€ ๋ณธ Logistic Regression๋„ ๊ฒฐ๊ตญ์€ Linear Classification์— ์†ํ•œ๋‹ค.
    • Linear / non-Linear Classification ์—ฌ๋ถ€๋ฅผ ๊ฒฐ์ •ํ•˜๋Š” ๊ฒƒ์€ parameter $\theta$์™€ ์ž…๋ ฅ $x$๊ฐ€ ์–ด๋–ป๊ฒŒ ์—ฎ์—ฌ ์žˆ์œผ๋ƒ ์ด๋‹ค.
  • Linear Classification์—์„œ parameter $\theta$๋Š” Hyperplain์˜ normal vector์ด๋‹ค.
  • Learning์„ ํ†ตํ•ด $\theta$๋ฅผ ๊ฐฑ์‹ ํ•˜๋ฉด Hyperplain์˜ ๊ธฐ์šธ๊ธฐ๊ฐ€ ๋ณ€ํ™”ํ•œ๋‹ค.

  1. ์ถœ์ฒ˜: Frames of reference and their neural correlates within navigation in a 3D environment(M. Vavrecka, et al., 2012)ย 

  2. ๋‹จ, โ€œ๋ชจ๋“  Decision Boundary๊ฐ€ Hyperplain์ด๋‹ค.โ€๋Š” ๋ง์€ ๊ฑฐ์ง“์ด๋‹ค. ์šฐ๋ฆฌ๊ฐ€ ๋‹ค๋ฃจ๋Š” Binary Classification์˜ ๊ฒฝ์šฐ๋Š” Boundary๊ฐ€ Hyperplain์˜ ํ˜•ํƒœ๋กœ ํ‘œํ˜„๋œ๋‹ค.ย