๋ณธ ๊ธ€์€ 2020-2ํ•™๊ธฐ โ€œ์ปดํ“จํ„ฐ ๋น„์ „โ€ ์ˆ˜์—…์˜ ๋‚ด์šฉ์„ ๊ฐœ์ธ์ ์œผ๋กœ ์ •๋ฆฌํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ง€์ ์€ ์–ธ์ œ๋‚˜ ํ™˜์˜์ž…๋‹ˆ๋‹ค :)

12 minute read

๋ณธ ๊ธ€์€ 2020-2ํ•™๊ธฐ โ€œ์ปดํ“จํ„ฐ ๋น„์ „โ€ ์ˆ˜์—…์˜ ๋‚ด์šฉ์„ ๊ฐœ์ธ์ ์œผ๋กœ ์ •๋ฆฌํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ง€์ ์€ ์–ธ์ œ๋‚˜ ํ™˜์˜์ž…๋‹ˆ๋‹ค :)

๐Ÿ’ฅ (before start) SVM์—์„œ๋Š” class label์ด $\{ -1, +1\}$๋กœ ์ธ์ฝ”๋”ฉ ๋˜์–ด ์žˆ๋‹ค๊ณ  ๊ฐ€์ •ํ•œ๋‹ค.

Introduction to SVM

Linearly Separableํ•œ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ์˜ ์ง‘ํ•ฉ์ด ์žˆ์„ ๋•Œ, ๋‘ ์ง‘ํ•ฉ์„ ๋‚˜๋ˆ„๋Š” hyper-plane์€ ๋ฌดํ•œํžˆ ๋งŽ์ด ๊ทธ๋ฆด ์ˆ˜ ์žˆ๋‹ค. <SVM; Support Vector Machine>์€ ๋ฌดํ•œํžˆ ๋งŽ์€ hyper-plane ์ค‘ ์–ด๋–ค ๊ฒƒ์ด ๊ฐ€์žฅ best์ธ์ง€ ์ฐพ๋Š” ๋ถ„๋ฅ˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด๋‹ค.

<SVM>์—์„œ๋Š” best hyper-plane์„ ์•„๋ž˜์™€ ๊ฐ™์ด ์ •์˜ํ•œ๋‹ค.

The hyper-plane that maximizes the margin!

์ฆ‰, <SVM>์€ โ€œmarginโ€œ์„ ์ตœ๋Œ€ํ™”ํ•˜๋Š” hyper-plane์ธ ๊ฒƒ์ด๋‹ค. ๊ทธ๋Ÿผ โ€œmarginโ€œ์€ ๋ฌด์—‡์ผ๊นŒ? ์‰ฝ๊ฒŒ ์„ค๋ช…ํ•˜๋ฉด, ๋ฐ์ดํ„ฐ๋ฅผ ์„ ํ˜•์œผ๋กœ ๋ถ„๋ฆฌํ•˜๋Š” hyper-plane์—์„œ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ๋ฐ์ดํ„ฐ์˜ ๊ฑฐ๋ฆฌ โ€œmarginโ€œ์ด๋ผ๊ณ  ํ•œ๋‹ค.

์œ„ ๊ทธ๋ฆผ์„ ๋ณด๋ฉด, $B_1$๊ณผ $B_2$ ๋ชจ๋‘ ๋ฐ์ดํ„ฐ์…‹์„ ์ž˜ ๋ถ„ํ• ํ•˜๊ณ  ์žˆ์ง€๋งŒ, $B_1$์ด $B_2$ ๋ณด๋‹ค ๋” ์—ฌ์œ ๋กญ๊ฒŒ ๋ถ„๋ฆฌํ•˜๊ณ  ์žˆ๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค. ์ด๋•Œ, ์–ผ๋งˆ๋‚˜ ์—ฌ์œ ๋กญ๊ฒŒ ๋ถ„๋ฆฌํ•˜๊ณ  ์žˆ๋Š”์ง€๋ฅผ hyper-plane๊ณผ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ๋ฐ์ดํ„ฐ์˜ ๊ฑฐ๋ฆฌ๋กœ ์ˆ˜์น˜ํ™”ํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์ด๊ฒƒ์ด ๋ฐ”๋กœ โ€œmarginโ€œ์ด๋‹ค.

โ€œmarginโ€์— ๋Œ€ํ•œ ์‹์„ ์œ ๋„ํ•˜๊ธฐ ์œ„ํ•ด hyper-plane์„ ์•„๋ž˜์™€ ๊ฐ™์ด ์ •์˜ํ•ด๋ณด์ž.

\[w^T x + b = 0\]

์ด๋•Œ, $w$๋Š” hyper-plane์˜ ๋ฒ•์„ ๋ฒกํ„ฐ๋‹ค.

hyper-plane์„ ์ž˜ ์ •์˜ํ–ˆ์œผ๋ฉด, โ€œmarginโ€œ์€ โ€œ์ ๊ณผ ํ‰๋ฉด ์‚ฌ์ด ๊ฑฐ๋ฆฌ ๊ณต์‹โ€œ์„ ํ†ตํ•ด ์‰ฝ๊ฒŒ ๊ตฌํ•  ์ˆ˜ ์žˆ๋‹ค.

\[\text{dist}(x_0) = \frac{ | w^T x_0 + b | }{ \| w \| }\]

* ๋งŒ์•ฝ margin์˜ ๋ฐฉํ–ฅ์„ ๊ตฌ๋ถ„ํ•˜๊ณ  ์‹ถ๋‹ค๋ฉด, ๋ถ„์ž ๋ถ€๋ถ„์˜ ์ ˆ๋Œ“๊ฐ’์„ ์“ฐ์ง€ ์•Š์œผ๋ฉด ๋œ๋‹ค!

๋˜๋Š” ์œ„์˜ ์‹์„ ์•ฝ๊ฐ„ ๋ณ€ํ˜•ํ•ด ์•„๋ž˜์™€ ๊ฐ™์ด ์‚ฌ์šฉํ•  ์ˆ˜๋„ ์žˆ๋‹ค.

\[\text{dist}(x_0) = \frac{ y_0 \cdot (w^T x_0 + b) }{ \| w \| }\]

์‚ฌ์‹ค ์šฐ๋ฆฌ๊ฐ€ ํ‰์†Œ์— ์“ฐ๋Š” โ€œmarginโ€์˜ ๊ฐœ๋…์€ ์œ„์˜ ์‹์—์„œ ๋ถ„์ž์ธ $y_0 \cdot (w^T x_0 + b)$์ด๋‹ค. ์ด โ€œmarginโ€์€ class label์ด correctly classified ๋˜์—ˆ๋‹ค๋ฉด, ํ•ญ์ƒ ์–‘์ˆ˜์˜ ๊ฐ’์„ ๊ฐ–๋Š”๋‹ค. (linearly separable)ํ•œ SVM์—์„œ๋Š” ์ด margin ๊ฐ’์ด ํ•ญ์ƒ ์–‘์ˆ˜๋‹ค!

์œ„์˜ ์ -ํ‰๋ฉด ๊ฑฐ๋ฆฌ ๊ณต์‹์„ ๋ฐ”ํƒ•์œผ๋กœ โ€˜the minimal distanceโ€™์ธ โ€œmarginโ€œ์„ ํ‘œํ˜„ํ•˜๋ฉด ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

\[\begin{aligned} \text{margin} &= \min_i \left[ \text{dist}(x_i) \right] \\ &= \min_i \left[ \frac{ y_i \cdot (w^T x_i + b) }{ \| w \| } \right] \\ &= \frac{1}{\| w \|} \min_i \left[ y_i \cdot (w^T x_i + b) \right] \end{aligned}\]

์ด์— ์œ„์—์„œ ์œ ๋„ํ•œ โ€œmarginโ€์— ๋Œ€ํ•œ ์‹์œผ๋กœ <SVM>์˜ ์ตœ์ ํ™” ๋ฌธ์ œ๋ฅผ ๊ธฐ์ˆ ํ•˜๋ฉด ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

\[\underset{w, b}{\text{argmax}} \left[ \frac{1}{\| w \|} \min_i \left[ y_i \cdot (w^T x_i + b) \right] \right]\]

์ด์ œ <SVM>์˜ ์ตœ์ ํ™” ๋ฌธ์ œ๋ฅผ ์ •์˜ํ–ˆ์œผ๋‹ˆ, ์ด ๋ฌธ์ œ์˜ solution์„ ์ฐพ์•„๋ณด์ž!


Convex Optimization

\[\underset{w, b}{\text{argmax}} \left[ \frac{1}{\| w \|} \min_i \left[ y_i \cdot (w^T x_i + b) \right] \right]\]

๋จผ์ €, <SVM>์˜ ์ตœ์ ํ™” ์‹์—์„œ ์•ฝ๊ฐ„์˜ normalization์„ ์ˆ˜ํ–‰ํ•ด์ค€๋‹ค.

๊ทธ ์ด์œ ๋Š” hyper-plane $w^T x + b$๋‚˜ $c(w^T x + b)$๋‚˜ ๋™์ผํ•œ ํ‰๋ฉด์„ ์ •์˜ํ•˜๊ธฐ ๋•Œ๋ฌธ์—, ๋ฌธ์ œ์˜ ์ž์œ ๋„๋ฅผ ๋‚ฎ์ถ”๊ณ  ์‹์„ ํ’€๊ธฐ ์‰ฝ๊ฒŒ ๋ณ€ํ˜•ํ•˜๊ธฐ ์œ„ํ•จ์ด๋‹ค!

์šฐ๋ฆฌ๋Š” ์•„๋ž˜์˜ ์กฐ๊ฑด์„ ๋งŒ์กฑํ•˜๋Š” $w$์™€ $b$๋กœ hyper-plane์˜ ์‹์„ normalize ํ•œ๋‹ค.

\[w^T x_{+} + b = 1 \quad \text{and} \quad w^T x_{-} + b = -1\]

์ด๋•Œ, $x_{+}$์™€ $x_{-}$๋Š” hyper-plane์ด ๋ถ„ํ• ํ•˜๋Š” label์—์„œ โ€œmarginโ€์„ ์ด๋ฃจ๋Š” ์ ์ด๋‹ค. ์šฐ๋ฆฌ๋Š” ์ด ์ ์„ โ€œsupport vectorโ€œ๋ผ๊ณ  ๋ถ€๋ฅธ๋‹ค!

ps) ๋ณธ์ธ์€ ์œ„์™€ ๊ฐ™์ด ๋‘ support vector์— ๋Œ€ํ•œ ๊ฐ’์ด $\pm1$์ด ๋˜๋„๋ก normํ•˜๋Š”๊ฒŒ ๊ฐ€๋Šฅํ•œ์ง€ ์ž˜ ์ดํ•ด๊ฐ€ ์•ˆ ๋˜์—ˆ๋Š”๋ฐ, ์ž˜ ์ƒ๊ฐํ•ด๋ณด๋‹ˆ๊นŒ ๋‘ support vector๊ฐ€ ๊ฐ™์€ margin์„ ๊ฐ€์ง€๋„๋ก ์„ค์ •ํ•˜๋ฉด ๋˜๋Š” ๊ฑฐ์˜€๋‹ค.1 ๋‹ค๋ฅด๊ฒŒ ์ƒ๊ฐํ•˜๋ฉด, ์œ„์™€ ๊ฐ™์ด normalize ํ•˜๋Š” ๊ฒƒ ์—ญ์‹œ ์ตœ์ ํ™” ์‹์— constraint๋กœ ์ž‘์šฉํ•  ๊ฑฐ๋ผ๋Š” ์ƒ๊ฐ์ด ๋“ ๋‹ค.

์œ„์™€ ๊ฐ™์ด ์„ค์ •ํ•˜๋ฉด, ๊ณง ์•„๋ž˜์˜ ์‹์ด ์„ฑ๋ฆฝํ•œ๋‹ค.

\[\text{margin} = \frac{1}{\| w \|}\]

์ด๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ <SVM>์˜ ์ตœ์ ํ™” ์‹์„ ๋‹ค์‹œ ์“ฐ๋ฉด ์•„๋ž˜์™€ ๊ฐ™๋‹ค. ์šฐ๋ฆฌ๊ฐ€ โ€œsupport vectorโ€์˜ ๊ฐ’์ด $\pm1$์ด ๋˜๋„๋ก ์„ค์ •ํ–ˆ๊ธฐ ๋•Œ๋ฌธ์— ๊ธฐ์กด ์‹์—์„œ โ€œconstraintโ€ ํ…€์ด ๋ถ™๋Š”๋‹ค.

\[\underset{w, b}{\text{argmax}} \frac{1}{\| w \|} \cdot 1 \quad \text{subject to} \quad y_i (w^T x_i + b) \ge 1 \;\; \forall i\]

์ด๋•Œ ์œ„์˜ ์ตœ์ ํ™” ์‹์€ ์•„๋ž˜์˜ convex optimization๊ณผ ๋™์น˜๋‹ค.

\[{\color{red}{\underset{w, b}{\text{argmin}} \frac{1}{2} \| w \|^2}} \quad \text{subject to} \quad y_i (w^T x_i + b) \ge 1 \;\; \forall i\]

Dual Problem

์œ„์˜ ๊ณผ์ •์„ ํ†ตํ•ด ์šฐ๋ฆฌ๋Š” <SVM>์„ โ€œConvex Optimizationโ€ ๋ฌธ์ œ์˜ ํ˜•ํƒœ๋กœ ์ž˜ ์œ ๋„ํ–ˆ๋‹ค.

\[\min_{w, b} \frac{1}{2} \| w \|^2 \quad \text{subject to} \quad y_i (w^T x_i + b) \ge 1 \;\; \forall i\]

์ด๋•Œ, โ€œConvex Optimizationโ€ ๋ฌธ์ œ์— โ€œLagrange Multiplierโ€ $\lambda_i$๋ฅผ ๋„์ž…ํ•˜๋ฉด <Dual Problem>์ด๋ผ๋Š” ์ƒˆ๋กœ์šด ์ตœ์ ํ™” ๋ฌธ์ œ๋ฅผ ์–ป๋Š”๋‹ค. ์ด๊ฒƒ์„ <Dual Problem>์ด๋ผ๊ณ  ํ•˜๋ฉฐ <SVM>์˜ ๊ฒฝ์šฐ๋Š” ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

\[\max_{\lambda} \left[ \min_{w, b} L(w, b, \lambda) \right] \quad \text{where} \quad L(w, b, \lambda) = \frac{1}{2} \| w \|^2 - \sum_{i=1}^n {\color{red}{\lambda_i}} \{ y_i (w^T x_i + b) - 1 \} \quad \text{and} \quad \lambda_i \ge 0\]

Lagrange Multiplier $\lambda_i$๋ฅผ ๋„์ž…ํ•˜๋ฉด์„œ, ๊ธฐ์กด ์‹์˜ constraint ๋ถ€๋ถ„์ด ์‹ $L(w, b, \lambda)$๋กœ ํก์ˆ˜ ๋˜์—ˆ๋‹ค.

์‹์ด ๊ธฐ์กด๋ณด๋‹ค ํ›จ์”ฌ ๋ณต์žกํ•ด์กŒ์ง€๋งŒ, ์œ„์˜ ์‹์€ ์ •๋ง ์ƒ๊ฐ๋ณด๋‹ค ๋„ˆ๋ฌด ์‰ฝ๊ฒŒ ํ’€๋ฆฐ๋‹ค!! ๐Ÿ˜ฒ

\[\frac{\partial L(w, b, \lambda)}{\partial w} = w - \sum_{i=1}^n \lambda_i y_i x_i = 0 \quad \iff \quad w = \sum_{i=1}^n \lambda_i y_i x_i\] \[\frac{\partial L(w, b, \lambda)}{\partial b} = 0 - \sum_{i=1}^n \lambda_i y_i = 0 \quad \iff \quad \sum_{i=1}^n \lambda_i y_i = 0\]

์™€์šฐ ์ •๋ง ๊ฐ„๋‹จํ•˜์ง€ ์•Š์€๊ฐ€?? ์ด๊ฒƒ์€ ์šฐ๋ฆฌ๊ฐ€ Lagrange Multiplier๋ฅผ ๋„์ž…ํ•˜๋ฉด์„œ, constraint๋ฅผ ํก์ˆ˜ํ–ˆ๊ธฐ ๋•Œ๋ฌธ์— ๋‹จ์ˆœํžˆ ํŽธ๋ฏธ๋ถ„ ๋งŒ์œผ๋กœ ์ตœ์ ํ™” ์‹์˜ ํ•ด(่งฃ)๋ฅผ ๊ตฌํ•  ์ˆ˜ ์žˆ๋Š” ๊ฒƒ์ด๋‹ค!! ๐Ÿ˜†


ํ•˜์ง€๋งŒ ์•„์ง ๋ฌธ์ œ๋ฅผ ์™„์ „ํžˆ ํ•ด๊ฒฐํ•œ ๊ฒƒ์€ ์•„๋‹ˆ๋‹ค. ์ตœ์ ์˜ $w$๋Š” ์ฐพ์•˜์ง€๋งŒ, ๊ทธ ์‹์— $\lambda_i$๊ฐ€ ์žˆ์–ด ์™„์ „ํ•œ ่งฃ๋ฅผ ์–ป์€ ๊ฒƒ์ด ์•„๋‹ˆ๋‹ค. ์œ„์˜ ๊ณผ์ •์€ ๊ธฐ์กด์˜ <Dual Problem>์˜ ์ตœ์ ํ™” ๋ฌธ์ œ๋ฅผ ์•„๋ž˜์™€ ๊ฐ™์ด ํ•œ๊บผํ’€ ๋ฒ—๊ธด ๊ฒƒ์— ๋ถˆ๊ณผํ•œ๋‹ค.

\[\begin{aligned} &\max_{\lambda} \left[ \frac{1}{2} \| w \|^2 - \sum_{i=1}^n \lambda_i \{ y_i (w^T x_i + b) - 1 \} \right] \\ &\text{where} \quad w = \sum_{i=1}^n \lambda_i y_i x_i \quad \text{and} \quad \sum_{i=1}^n \lambda_i y_i = 0 \quad \text{and} \quad \lambda_i \ge 0 \end{aligned}\]

์ด๋•Œ ์œ„์˜ ์‹์—์„œ $\sum \lambda_i y_i = 0$๋ฅผ ์ ์šฉํ•ด ์‹์˜ ์˜ค๋ฅธ์ชฝ ํ…€์„ ์•„๋ž˜์™€ ๊ฐ™์ด ๋งŒ๋“ค ์ˆ˜ ์žˆ๋‹ค.

\[\max_{\lambda} \left[ \frac{1}{2} \| w \|^2 - \sum_{i=1}^n \lambda_i ( y_i w^T x_i - 1 ) \right] = \max_{\lambda} \left[ \frac{1}{2} \| w \|^2 - \sum_{i=1}^n \lambda_i y_i w^T x_i + \sum_{i=1}^n \lambda_i \right]\]

์ด๋ฒˆ์—๋Š” $w = \sum \lambda_i y_i x_i$๋ฅผ ๋Œ€์ž…ํ•˜์ž.

\[\begin{aligned} \max_{\lambda} \left[ \frac{1}{2} \| w \|^2 - \sum_{i=1}^n \lambda_i y_i w^T x_i + \sum_{i=1}^n \lambda_i \right] &= \max_{\lambda} \left[ \frac{1}{2} \| \sum_{i=1}^n \lambda_i y_i x_i \|^2 - \sum_{i=1}^n \lambda_i y_i \sum_{j=1}^n \lambda_j y_j x_j^T x_i + \sum_{i=1}^n \lambda_i \right] \\ &= \max_{\lambda} \left[ \frac{1}{2} \sum_{i=1}^n \sum_{j=1}^n \lambda_i \lambda_j y_i y_j x_i^T x_j - \sum_{i=1}^n \sum_{j=1}^n \lambda_i \lambda_j y_i y_j x_i^T x_j + \sum_{i=1}^n \lambda_i \right] \\ &= \max_{\lambda} \left[ \sum_{i=1}^n \lambda_i - \frac{1}{2} \sum_{i=1}^n \sum_{j=1}^n \lambda_i \lambda_j y_i y_j x_i^T x_j \right] \end{aligned}\]

์‹์„ ์ •๋ฆฌํ•˜๋ฉด ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

\[\begin{aligned} \max_{\lambda} \left[ \sum_{i=1}^n \lambda_i - \frac{1}{2} \sum_{i=1}^n \sum_{j=1}^n \lambda_i \lambda_j y_i y_j x_i^T x_j \right] \\ \text{where} \quad \lambda_i \ge 0 \quad \text{and} \quad \sum_{i=1}^n \lambda_i y_i = 0 \end{aligned}\]

์œ„์˜ ์ตœ์ ํ™” ๋ฌธ์ œ์˜ ่งฃ๋Š” <QP; Quadratic Programming)>๋กœ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค๋ฉฐ, ๊ทธ๋•Œ์˜ ่งฃ $\lambda^{*}$๋Š” ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

\[\begin{aligned} \lambda^{*} = \underset{\lambda}{\text{argmax}} \; \left[ \sum_{i=1}^n \lambda_i - \frac{1}{2} \sum_{i=1}^n \sum_{j=1}^n \lambda_i \lambda_j y_i y_j x_i^T x_j \right] \\ \text{where} \quad \lambda_i \ge 0 \quad \text{and} \quad \sum_{i=1}^n \lambda_i y_i = 0 \end{aligned}\]

์ด์ œ, solution $\lambda^{*}$๋Œ€์ž…ํ•˜๋ฉด $w$, $b$์— ๋Œ€ํ•œ ่งฃ๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค.

\[w^{*} = \sum_{i=1}^n \lambda^{*}_i y_i x_i\]

์ด๋•Œ $\lambda^{*}_i$๋Š” 0 ๋˜๋Š” ์–‘์ˆ˜์˜ ๊ฐ’์„ ๊ฐ–๋Š”๋ฐ,

  • If $\lambda^{*}_i = 0$, then $x_i$๋Š” hyper-plane์„ ์ •์˜ํ•˜๋Š”๋ฐ ๊ธฐ์—ฌํ•˜์ง€ ์•Š๋Š”๋‹ค.
  • If $\lambda^{*}_i > 0$, then $x_i$๋Š” hyper-plane์„ ์ •์˜ํ•˜๋Š”๋ฐ ๊ธฐ์—ฌํ•˜๊ณ , ์ด๊ฒƒ์„ โ€œsupport vectorโ€œ๋ผ๊ณ  ๋ถ€๋ฅธ๋‹ค!

$b^{*}$๋Š” $w^T x_{+} + b = 1$์˜ ์‹์„ ํ†ตํ•ด ์œ ๋„ํ•˜๋ฉด ๋œ๋‹ค. ๋”ฐ๋กœ ์‹์„ ์ œ์‹œํ•˜์ง€๋Š” ์•Š๊ฒ ๋‹ค.


Soft-margin SVM

๋งŒ์•ฝ ๋ฐ์ดํ„ฐ์…‹์ด linearly separable ํ•˜์ง€ ์•Š๋‹ค๋ฉด, ์œ„์˜ <SVM>์˜ ่งฃ๋ฅผ ๊ตฌํ•  ์ˆ˜ ์—†๋‹ค! ๐Ÿคฏ ์ด๋Ÿฐ ๊ฒฝ์šฐ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด <slack variable> $\xi_i$๋ฅผ ๋„์ž…ํ•œ๋‹ค! ๊ทธ ๊ฒฐ๊ณผ, <SVM>์— ๋Œ€ํ•œ ์‹์€ ์•„๋ž˜์™€ ๊ฐ™์€ ์ตœ์ ํ™” ๋ฌธ์ œ๊ฐ€ ๋œ๋‹ค.

\[\begin{aligned} \min_{w, b, \xi} \frac{1}{2} \| w \|^2 &+ C \sum_{i=1}^n \xi_i\\ \text{subject to} &\quad y_i (w^T x_i + b) \ge 1 - \xi_i \;\; \forall i, \quad \text{and} \quad \xi_i \ge 0 \end{aligned}\]

์ด๊ฒƒ์€ support vector๊ฐ€ ๋งŒ๋“œ๋Š” margin ์˜์—ญ๋ณด๋‹ค ๋” ์•ˆ์ชฝ์— ๋ช‡๊ฐœ์˜ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ๊ฐ€ ์กด์žฌํ•  ์ˆ˜ ์žˆ๋„๋ก ๋งŒ๋“ค์–ด ์ค€๋‹ค!

\[y_i (w^T x_i + b) \ge 1 - \xi_i\]

์‰ฝ๊ฒŒ ์ƒ๊ฐํ•ด ๋ฐ์ดํ„ฐ์…‹์„ non-separable ํ•˜๊ฒŒ ๋งŒ๋“œ๋Š” ๋ฐ์ดํ„ฐํฌ์ธํŠธ์— ๋Œ€ํ•ด์„  $\xi_i$๊ฐ€ ์–‘์ˆ˜์˜ ๊ฐ’์„ ๊ฐ€์ ธ ๊ทธ๋“ค์˜ margin ๊ฐ’์ด ์กฐ๊ธˆ ์ž‘์•„์ ธ๋„ ํ—ˆ์šฉํ•œ๋‹ค๊ณ  ์ดํ•ดํ•ด๋„ ๋  ๊ฒƒ ๊ฐ™๋‹ค.


Non-Linear SVM


  1. ๋ฌผ๋ก  ์–ด๋Š ํ•œ์ชฝ์˜ support vector๊ฐ€ ๋” ์งง์„ ์ˆ˜๋„ ์žˆ๊ฒ ์ง€๋งŒ, ๊ทธ๊ฒƒ์€ SVM์˜ ์ทจ์ง€์— ์–ด๊ธ‹๋‚˜๋ฏ€๋กœ ๊ธฐ๊ฐํ•œ๋‹ค.ย