2021-1ํ•™๊ธฐ, ๋Œ€ํ•™์—์„œ โ€˜ํ†ต๊ณ„์  ๋ฐ์ดํ„ฐ๋งˆ์ด๋‹โ€™ ์ˆ˜์—…์„ ๋“ฃ๊ณ  ๊ณต๋ถ€ํ•œ ๋ฐ”๋ฅผ ์ •๋ฆฌํ•œ ๊ธ€์ž…๋‹ˆ๋‹ค. ์ง€์ ์€ ์–ธ์ œ๋‚˜ ํ™˜์˜์ž…๋‹ˆ๋‹ค :)

4 minute read

2021-1ํ•™๊ธฐ, ๋Œ€ํ•™์—์„œ โ€˜ํ†ต๊ณ„์  ๋ฐ์ดํ„ฐ๋งˆ์ด๋‹โ€™ ์ˆ˜์—…์„ ๋“ฃ๊ณ  ๊ณต๋ถ€ํ•œ ๋ฐ”๋ฅผ ์ •๋ฆฌํ•œ ๊ธ€์ž…๋‹ˆ๋‹ค. ์ง€์ ์€ ์–ธ์ œ๋‚˜ ํ™˜์˜์ž…๋‹ˆ๋‹ค :)

Additive Model

Definition. Additive Model

The regression model

\[E (Y \mid X_1, \dots, X_p) = \alpha + f_1(X_1) + \cdots + f_p(X_p)\]

is called an <additive model>.

It assumes that there is no interaction effect.

Therefore, it can effectively avoid โ€œthe curve of dimensionalityโ€.


โœจ Goal: How can we estimate $f_i(x_i)$?

์ด๋•Œ ์“ฐ๋Š” ์ ‘๊ทผ๋ฒ•์ด ๋ฐ”๋กœ <Backfitting Algorithm>์ด๋‹ค.


๋ณดํ†ต ์šฐ๋ฆฌ๊ฐ€ $f_j$๋ฅผ ์ œ์™ธํ•œ ๋‚˜๋จธ์ง€ $f_k$์— ๋Œ€ํ•œ ํ•จ์ˆ˜๋ฅผ ์•Œ๊ณ  ์žˆ์„ ๋•Œ, $f_j$๋ฅผ ์ถ”์ •ํ•˜๋Š” ๊ฒƒ์€ ์•„์ฃผ ์‰ฝ๋‹ค. ๊ทธ๋ƒฅ 1์ฐจ์› ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ๊ฒƒ์ด๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค!

์ด๋ ‡๊ฒŒ ๋‹ค๋ฅธ ํ•จ์ˆ˜๋ฅผ fix ์‹œ์ผœ๋‘๊ณ , ํ•จ์ˆ˜ ํ•˜๋‚˜๋ฅผ fitting ํ•˜๋Š” ๊ธฐ๋ฒ•์„ <Backfitting Algorithm>์ด๋ผ๊ณ  ํ•œ๋‹ค.โ€™

Algorithm. Backfitting Algorithm

1. Initialize:
- $\hat{\alpha} = \bar{y}$
- for $\forall \, j$, $\hat{f}_j = 0$

2. Repeat until converge

find an estiamtor $\hat{f}_j$ based on \(\left\{ x_i, \; y_i - \hat{\alpha} - \displaystyle \sum_{k\ne j} \hat{f}_k(x_{ik}) \right\}^n_{i=1}\)

๐Ÿ’ฅ ์ด๋•Œ, 2๋ฒˆ์งธ ์Šคํ…์—์„œ <smoothing spline>์ด๋‚˜ <kernel method> ๋“ฑ์˜ ๋‹ค๋ฅธ non-parameteric method๋“ค์„ ์ ์šฉํ•ด๋ณผ ์ˆ˜๋„ ์žˆ๋‹ค.

<Backfitting Algorithm>์€ Convex optimization๊ณผ ๋น„์Šทํ•˜๋‹ค๊ณ  ํ•˜๋ฉฐ, ๊ต‰์žฅํžˆ ๋น ๋ฅด๊ฒŒ ์ˆ˜๋ ดํ•œ๋‹ค๊ณ  ํ•œ๋‹ค! ๐Ÿ˜ฒ


GAM; Generalized Additive Model

<GAM; Generalized Additive Model>์€ ๊ฐ•๋ ฅํ•˜๋ฉด์„œ๋„ ๊ฐ„๋‹จํ•œ ํ†ต๊ณ„์  ํ…Œํฌ๋‹‰ ์ค‘ ํ•˜๋‚˜์ด๋‹ค. 1986๋…„, ESL์˜ ๊ณต๋™์ €์ž์ธ โ€œTrevor Hastieโ€์™€ โ€œRobert Tibshiraniโ€์— ์˜ํ•ด ๊ฐœ๋ฐœ๋œ ๋ฐฉ๋ฒ•์ด๋‹ค.

Relationships btw the individual predictors and the dependent variable follow smooth patterns that can be linear or non-linear.

์ฆ‰, GAM์€ <Additive Model>์—์„œ $f_j$๊ฐ€ smooth non-parametric์ธ ๋ชจ๋ธ์ด๋‹ค!


โ€˜DataCampโ€™์˜ ์œ ํŠœ๋ธŒ ์˜์ƒ์—์„œ๋Š” <GAM>์ด <Linear Model>๊ณผ <Bloack-BOX ML> ๋ชจ๋ธ์˜ ์ค‘๊ฐ„ ์ •๋„์— ์œ„์น˜ํ•˜๋Š” ๋ชจ๋ธ์ด๋ผ๊ณ  ์†Œ๊ฐœํ•œ๋‹ค.

ํ†ต๊ณ„์  ๋ชจ๋ธ์€ <Interpretability>์™€ <Flexibility>์— trade-off๊ฐ€ ์žˆ๋Š”๋ฐ, ์™ผํŽธ๊ณผ ์˜ค๋ฅธํŽธ์ด ๊ฐ๊ฐ์„ ์˜๋ฏธํ•œ๋‹ค.

<GAM>์€ ๋”ฑ ์ค‘๊ฐ„ ์ •๋„์— ์œ„์น˜ํ•œ ๋ชจ๋ธ๋กœ, ์ ๋‹นํ•œ <Interpretability>์™€ ์ ๋‹นํ•œ <Flexibility>๋ฅผ ์ œ๊ณตํ•œ๋‹ค.

์œ„์˜ ๊ทธ๋ฆผ์€ <GAM>์—์„œ ์‚ฌ์šฉ๋œ <smooth basis function>๋“ค์„ ํ‘œํ˜„ํ•œ ๊ฒƒ์ด๋‹ค. ์™ผ์ชฝ ๊ทธ๋ฆผ์€ ๋ชจ๋“  basis func.์— ๋™์ผํ•œ coeff.๋ฅผ ์ค€ ๊ทธ๋ฆผ์ด๊ณ , ์˜ค๋ฅธ์ชฝ ๊ทธ๋ฆผ์€ ํ•™์Šต์„ ํ†ตํ•ด ๊ฐ basis func.์— ํŠœ๋‹๋œ coeff.๋ฅผ ์ค€ ๊ทธ๋ฆผ์ด๋‹ค.


โ€˜multithreadedโ€™์— ๊ฒŒ์‹œ๋œ ํฌ์ŠคํŠธ์—์„œ๋Š” GAM์˜ ์žฅ์ ์œผ๋กœ

(1) Interpretability
(2) Flexibility & Automation
(3) Regularization

์„ ๊ผฝ๋Š”๋‹ค.

์ด ์ค‘์—์„œ ๋จผ์ € โ€œRegularizationโ€œ๋ถ€ํ„ฐ ์‚ดํŽด๋ณด์ž. smooth function์„ ์ถ”์ •ํ•˜๋Š” GAM์€ โ€œsmoothnessโ€๋ฅผ ์ปจํŠธ๋กค ํ•˜๋Š” tuning parameter $\lambda$๊ฐ€ ์กด์žฌํ•œ๋‹ค. ์ด๊ฒƒ์„ ํ†ตํ•ด overfitting์„ ๋ฐฉ์ง€ํ•˜๊ณ , ์ „์ฒด predictor๊ฐ€ wiggle ํ•ด์ง€๋Š” ๊ฒƒ์„ ๋ฐฉ์ง€ํ•œ๋‹ค.

โ€˜DataCampโ€™์˜ ์œ ํŠœ๋ธŒ ์˜์ƒ์— ๋”ฐ๋ฅด๋ฉด, <GAM>์€ ์•„๋ž˜์˜ ์ˆ˜์‹์— ๋”ฐ๋ผ ๋ชจ๋ธ์˜ Wiggliness๋ฅผ ์กฐ์ •ํ•œ๋‹ค๊ณ  ํ•œ๋‹ค.

\[\text{Fit} = \text{Likelihood} - \lambda \times \text{Wiggliness}\]

๋˜๋Š” basis func.์˜ ์ˆ˜๋กœ๋„ smoothness๋ฅผ ์กฐ์ •ํ•  ์ˆ˜ ์žˆ๋‹ค.


<GAM>์€ input feature ์ˆ˜๊ฐ€ ์—ฌ๋Ÿฌ ๊ฐœ์ผ ๋•Œ๋„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค. ์ด๋•Œ, ๊ฐ input feature๊ฐ€ independent ํ•˜๋‹ค๋Š” <additive model>์˜ ๊ฐ€์ •์„ ์‚ฌ์šฉํ•œ๋‹ค!

์ž์„ธํ•œ ๋‚ด์šฉ์€ โ€˜DataCampโ€™์˜ ์˜์ƒ์„ ํ†ตํ•ด ์‚ดํŽด๋ณด์ž.

๐Ÿ‘‰ [YouTube] R Tutorial: Multivariate GAMs


์ฐธ๊ณ ์ž๋ฃŒ