2021-1ํ•™๊ธฐ, ๋Œ€ํ•™์—์„œ โ€˜ํ†ต๊ณ„์  ๋ฐ์ดํ„ฐ๋งˆ์ด๋‹โ€™ ์ˆ˜์—…์„ ๋“ฃ๊ณ  ๊ณต๋ถ€ํ•œ ๋ฐ”๋ฅผ ์ •๋ฆฌํ•œ ๊ธ€์ž…๋‹ˆ๋‹ค. ์ง€์ ์€ ์–ธ์ œ๋‚˜ ํ™˜์˜์ž…๋‹ˆ๋‹ค :)

5 minute read

2021-1ํ•™๊ธฐ, ๋Œ€ํ•™์—์„œ โ€˜ํ†ต๊ณ„์  ๋ฐ์ดํ„ฐ๋งˆ์ด๋‹โ€™ ์ˆ˜์—…์„ ๋“ฃ๊ณ  ๊ณต๋ถ€ํ•œ ๋ฐ”๋ฅผ ์ •๋ฆฌํ•œ ๊ธ€์ž…๋‹ˆ๋‹ค. ์ง€์ ์€ ์–ธ์ œ๋‚˜ ํ™˜์˜์ž…๋‹ˆ๋‹ค :)

์šฐ๋ฆฌ๋Š” Feature์˜ ์ฐจ์›์ด ๋Š˜์–ด๋‚จ์— ๋”ฐ๋ผ <Curse of Dimensionality>๋ผ๋Š” ๋ฌธ์ œ๋ฅผ ๊ฐ–๊ฒŒ ๋œ๋‹ค. ์ด๋Ÿฐ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋งŽ์€ ๊ธฐ๋ฒ•๋“ค์ด ์ œ์‹œ๋˜์—ˆ๊ณ , ์ด๋ฒˆ ํฌ์ŠคํŠธ์—์„œ๋Š” ๊ทธ ๊ธฐ๋ฒ• ์ค‘ ์ „์ฒด feature์—์„œ ๋ช‡๊ฐœ๋ฅผ ์„ ํƒํ•ด ์‚ฌ์šฉํ•˜๋Š” <Feature Selection>์˜ ๊ธฐ๋ฒ•๋“ค์— ๋Œ€ํ•ด ์†Œ๊ฐœํ•  ์˜ˆ์ •์ด๋‹ค ๐Ÿ˜

Best Subset SelectionPermalink

For given kโ‰คp, choose k input variables. This makes (nk) number of models. Find a model with which the residual mean square error is minimized among all models having k input variables. Denote the model as Mk.

Select the optimal model among M0,M1,โ€ฆ,Mp. For Mp, this means we use all the input variables.

๐Ÿ’ฅ ์ด๋•Œ, ์–ด๋–ค ๋ชจ๋ธ์ด ์ข‹์€์ง€๋Š” Trainin Err๊ฐ€ ์•„๋‹ˆ๋ผ Test Err๋ฅผ ๋ด์•ผ ํ•œ๋‹ค!

Prostate cancer dataset

<Best subset Selection>์˜ ๊ฒฐ๊ณผ๋ฅผ ์‚ดํŽด๋ณด๋ฉด, ๋” ์ ์€ feature๋ฅผ ์‚ฌ์šฉํ–ˆ์Œ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  weight๋Š” ๋น„์Šทํ•˜๊ฒŒ ๋‚˜์™”๊ณ , ๋˜ ์˜ˆ์ธก ์„ฑ๋Šฅ ์—ญ์‹œ ์ „์ฒด feature๋ฅผ ์“ฐ๋Š” ๊ฒƒ๊ณผ ๋น„์Šทํ•œ ์ˆ˜์ค€์œผ๋กœ ๋‚˜์™”๋‹ค.

๋‹น์—ฐํ•˜๊ฒŒ๋„ ๋” ์ ์€ feature๋ฅผ ์ผ์œผ๋‹ˆ ๊ณ„์‚ฐ ์ธก๋ฉด์—์„œ๋„ ์ด๋“! ๐Ÿ˜


However, if p is large, say โ‰ฅ40, it becomes computationally infeasible.

(pk)=O(pk)

์ด๋Ÿฐ ๊ณ„์‚ฐ์ƒ์˜ ์ด์Šˆ ๋•Œ๋ฌธ์— <Best Subset Selection> ๋Œ€์‹  <Forward & Backward Selection> ๊ธฐ๋ฒ•์„ ์‚ฌ์šฉํ•œ๋‹ค.


Forward Stepwise SelectionPermalink

Start with the model M0 containing the intercept only.

Construct a sequence of models M1,โ€ฆ,Mp by sequentially adding the predictor that most improves the fit .

Choose the best model among M0,โ€ฆ,Mp.

์ฆ‰, input variable์„ ํ•˜๋‚˜ ์ถ”๊ฐ€ํ•  ๋•Œ, Test Err๊ฐ€ ๊ฐ€์žฅ ํฌ๊ฒŒ ๋‚ฎ์•„์ง€๋Š” ๋…€์„์„ ํ•˜๋‚˜ ๋„ฃ๊ฒ ๋‹ค๋Š” ๋ง์ด๋‹ค!


Backward Stepwise SelectionPermalink

Start with the full model Mp.

Construct a sequence of models Mpโˆ’1,โ€ฆ,M0 by sequentially deleting the predictor that has the least impact on the fit.

Choose the best model among Mp,โ€ฆ,M0.

์ฆ‰, input variable์„ ํ•˜๋‚˜ ์ œ๊ฑฐํ•  ๋•Œ, Test Err๊ฐ€ ์ œ์ผ ์ ๊ฒŒ ๋‚˜์˜ค๋Š” ๋…€์„์„ ๋บ€๋‹ค๋Š” ๋ง์ด๋‹ค!


Mallowโ€™s CpPermalink

<Mallowโ€™s Cp> ์ดํ•˜ Cp๋Š” ์–ด๋–ค ํ†ต๊ณ„ ๋ชจ๋ธ M๋ฅผ ํ‰๊ฐ€ํ•˜๋Š” ์ง€ํ‘œ ์ค‘ ํ•˜๋‚˜๋‹ค.

Cp(M)=1nโ‹…(โˆ‘i=1n(yiโˆ’y^)2+2dฯƒ^2)

์ฆ‰, Cp๋Š” Test Err์™€ ํ•จ๊ป˜ ๋ชจ๋ธ ํ”ผ์ฒ˜ ์ˆ˜ d๋ฅผ ๊ณ ๋ คํ•œ๋‹ค๋Š” ๋ง์ด๋‹ค!

๊ทธ๋ž˜์„œ ๋ชจ๋ธ์„ ์„ ํƒํ•  ๋•Œ, Cp๊ฐ€ ๊ฐ€์žฅ ๋‚ฎ์€ ๋ชจ๋ธ์„ ์„ ํƒํ•˜๋ฉด ๋œ๋‹ค.


AIC & BICPermalink

<AIC; Akaike Information Criterion>๊ณผ <BIC; Bayesian Information Criterion>์€ ์ข€๋” generalํ•œ model selection ์ง€ํ‘œ์ด๋‹ค.

<AIC> & <BIC>๋Š” <MLE> ๊ธฐ๋ฒ•๊ณผ๋„ ๊ด€๋ จ๋˜์–ด ์žˆ๋‹ค.

AIC(M)=โˆ’2nโ‹…loglik+2dn

์ด๋•Œ, loglik๋Š” โ€œlog-likelihoodโ€์˜ ์•ฝ์ž๋กœ,<AIC>๊ฐ€ MLE์— ์˜ํ•ด ์ตœ๋Œ€ํ™”๋œ โ€œlog-likelihoodโ€ ํ…€์— ๋ณ€์ˆ˜์˜ ๊ฐฏ์ˆ˜ d์— ๋”ฐ๋ฅธ ํŒจ๋„ํ‹ฐ๋ฅผ ์ถ”๊ฐ€ํ•œ ์ง€ํ‘œ๋ฅผ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

ํ”ผ์ฒ˜๋ฅผ ๋งŽ์ด ์“ฐ๋Š” ๋ชจ๋ธ์ด๋ผ๋ฉด(d๊ฐ€ ํฐ) RSS๋Š” ์ž‘์•„์ง€๊ฒŒ ๋œ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ AIC์™€ BIC๋Š” ํ”ผ์ฒ˜ ์ˆ˜ d๋ฅผ ํฌํ•จํ•˜๊ธฐ ๋•Œ๋ฌธ์— AIC/BIC๊ฐ€ ๊ฐ€์žฅ ์ž‘์€ ๋ชจ๋ธ์„ ์“ด๋‹ค๋Š” ๊ฒƒ์€ ์šฐ๋„(likelihood)๋ฅผ ๊ฐ€์žฅ ํฌ๊ฒŒ ํ•˜๋Š”(explainable) ๋™์‹œ์— ํ”ผ์ฒ˜ ๊ฐฏ์ˆ˜๋Š” ๊ฐ€์žฅ ์ ์€(parsimonious) ์ตœ์ ์˜ ๋ชจ๋ธ์„ ์“ด๋‹ค๋Š” ์˜๋ฏธ์ด๋‹ค.

BIC(M)=โˆ’2nโ‹…loglik+dโ‹…logโก(n)n

<AIC>์˜ ๊ฒฝ์šฐ <Mallowโ€™s Cp>์™€ ๋™์น˜๋ผ๊ณ  ํ•˜๋ฉฐ, <BIC>๊ฐ€ <AIC>๋ณด๋‹ค ๋” ์‹ฌํ”Œํ•œ ๋ชจ๋ธ์„ ์„ ํ˜ธํ•œ๋‹ค๊ณ  ํ•œ๋‹ค.


Instability of Variable SelectionPermalink

โ€œVariable selection methods are known to be unstable.โ€

- Breiman, L. (1996)

โ€œUnstableโ€ means that small change of data results in large change of the estimator.

This is because variable selection uses hard decision rule (hard survivie or die rule).

<Variable Selection>์˜ โ€˜instabilityโ€™ ๋ฌธ์ œ์— ๋Œ€ํ•œ ๋Œ€์•ˆ์œผ๋กœ ๋‹ค์Œ ํฌ์ŠคํŠธ์—์„œ ์†Œ๊ฐœํ•  <Shrinkage method>๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.


๐Ÿ‘‰ Shrinkage Method