2021-1ํ•™๊ธฐ, ๋Œ€ํ•™์—์„œ โ€˜ํ†ต๊ณ„์  ๋ฐ์ดํ„ฐ๋งˆ์ด๋‹โ€™ ์ˆ˜์—…์„ ๋“ฃ๊ณ  ๊ณต๋ถ€ํ•œ ๋ฐ”๋ฅผ ์ •๋ฆฌํ•œ ๊ธ€์ž…๋‹ˆ๋‹ค. ์ง€์ ์€ ์–ธ์ œ๋‚˜ ํ™˜์˜์ž…๋‹ˆ๋‹ค :)

12 minute read

2021-1ํ•™๊ธฐ, ๋Œ€ํ•™์—์„œ โ€˜ํ†ต๊ณ„์  ๋ฐ์ดํ„ฐ๋งˆ์ด๋‹โ€™ ์ˆ˜์—…์„ ๋“ฃ๊ณ  ๊ณต๋ถ€ํ•œ ๋ฐ”๋ฅผ ์ •๋ฆฌํ•œ ๊ธ€์ž…๋‹ˆ๋‹ค. ์ง€์ ์€ ์–ธ์ œ๋‚˜ ํ™˜์˜์ž…๋‹ˆ๋‹ค :)

์ด์ „ ์ฑ•ํ„ฐ์—์„œ๋Š” parameter๋ฅผ ์ถ”๋ก ํ•ด linear regreeion & linear classification์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์‚ดํŽด๋ณด์•˜๋‹ค.

\[f(x) = E[Y \mid X = x] = \beta^T x\]

์ด๋ฒˆ ์ฑ•ํ„ฐ์—์„œ๋Š” non-linear model์— ๋Œ€ํ•ด ์‚ดํŽด๋ณผ ์˜ˆ์ •์ด๋ฉฐ, <Basis Expansion>๊ณผ <Kernel Method>์„ ์ค‘์‹ฌ์œผ๋กœ non-linear model์„ ์‚ดํŽด๋ณธ๋‹ค!


Basis Expansion

Definition. Basis expansion

For $m=1, \dots, M$, let $h_m: \mathbb{R}^p \rightarrow \mathbb{R}$ be the $m$-th transformation of $X$.

Then, we model as

\[f(X) = \sum^M_{m=1} \beta_m h_m (X)\]

with a linear basis expansion in $X$.


Example.

1. 1-dimensional projection

\[h_m(X) = X_m \quad \text{for} \quad m=1, \dots, p\]

2. Covariatic transform (์ •์‹ ๋ช…์นญ์€ ์•„๋‹ˆ๊ณ , Cov ๊ฐ™์€ ๋Š๋‚Œ์ด๋ผ ๋ณธ์ธ์€ ์ด๋ ‡๊ฒŒ ๋ถ€๋ฅธ๋‹ค.)

\[h_m (X) = X^2_j \quad \text{or} \quad h_m(X) = X_j X_k\]

3. log transform or root transform

\[h_m(X) = \log (X_j) \quad \text{of} \quad h_m (X) = \sqrt{X_j}\]

4. range transform

\[h_m(X) = I(L_m \le X_k < U_m)\]

ํŠน์ • ๋ฒ”์œ„์— ๋”ฐ๋ผ $X_k$๋ฅผ ์ œ๋‹จํ•˜๋Š” transform์ด๋‹ค.


5. Splines ๐Ÿ”ฅ

๊ณง ๋งŒ๋‚  ์˜ˆ์ •!


6. Wavelet bases

์ •๋ง ์ค‘์š”ํ•˜๊ฒŒ ์“ฐ์ด๋Š” ๋ฐฉ๋ฒ•์ด์ง€๋งŒ, ์ •๊ทœ ์ˆ˜์—…์—์„œ๋Š” ๋‹ค๋ฃจ์ง€ ์•Š์•˜๋‹ค.


Polynomial Regression

"Every smooth function can be approximated by polynomials!"
-- Weierstrass's Approximation Theorem

For $p$-dimensional $X$, the $m$-th order polynomial is given as

\[f(X) = \beta_0 + \left(\sum^p_{j=1} \beta_j X_j\right) + \left(\sum^{p^2}_{j,k} \beta_{jk} X_j X_k\right) + \cdots + \left(\sum^{p^m}_{j_1, \dots, j_m} \beta_{j_1, \dots, j_m} X_{j_1} \cdots X_{jm} \right)\]
  • As $p$ increases, the # of parameters grows exponentially.
  • In general, it is difficult to estimate $p$-dimensional regression function for large $p$.

๊ทธ๋ฆฌ๊ณ  ๊ทธ๋Ÿฐ $m$-th order polynomial ๋ฐฉ์‹์€ <Multi-collinearity>์— ๋Œ€ํ•œ ๋ฌธ์ œ๋„ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค. ๐Ÿ‘‡

๋…ผ์˜์˜ ํŽธ์˜๋ฅผ ์œ„ํ•ด $p=1$๋ผ๊ณ  ํ•˜์ž. ๊ทธ๋ฆฌ๊ณ , $X_i$๋ฅผ ์•„๋ž˜์™€ ๊ฐ™์ด ๋””์ž์ธ ํ•˜์ž.

Let $X_1 \sim \text{Unif}(0, 1)$, and $X_m = X_1^m$ for $m \le 4$.

์ด๋•Œ, $X_1, \dots, X_4$์˜ Corr๋ฅผ ๊ตฌํ•ด๋ณด๋ฉด,

\[\text{Corr}(X_1, \dots, X_4) = \begin{bmatrix} 1.00 & 0.97 & 0.92 & 0.86 \\ 0.97 & 1.00 & 0.98 & 0.95 \\ 0.92 & 0.98 & 1.00 & 0.99 \\ 0.86 & 0.95 & 0.99 & 1.00 \end{bmatrix}\]

์™€ ๊ฐ™์€๋ฐ, Corr์˜ ๊ฐ’์„ ์‚ดํŽด๋ณด๋ฉด, ๋ชจ๋‘ 1์— ๊ฐ€๊นŒ์šด ๊ฐ’์„ ๊ฐ€์ง„๋‹ค. ์ด๊ฒƒ์€ ๊ฐ ๋…๋ฆฝ๋ณ€์ˆ˜ ์‚ฌ์ด์— ๊ฐ•ํ•œ ์ƒ๊ด€๊ด€๊ณ„๊ฐ€ ์žˆ๋‹ค๋Š” ๊ฒƒ์„ ์˜๋ฏธํ•˜๋ฉฐ, ์ด๊ฒƒ์„ <Multi-collinearity; ๋‹ค์ค‘๊ณต์„ ์„ฑ>์ด๋ผ๊ณ  ํ•œ๋‹ค.

  • High order polynomials are often unstable at the boundary.

โ†’ bad!!! ๐Ÿ˜ฅ


<Polynomial Regression>์—์„œ์˜ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ํ•จ์ˆ˜๋ฅผ local๋กœ ๋ถ„ํ• ํ•ด ๊ทผ์‚ฌํ•˜๋Š” <Local Polynomial Regression> ๋ฐฉ์‹์ด ์ œ์•ˆ ๋˜์—ˆ๋‹ค.


Local Polynomial Regression

"Every smooth function can be locally approximated by low dimensional polynomials!"
-- Talyor's Theorem

Supp. that $p=1$. Divide domain interval $(a, b)$ into $K+1$ sub-intervals $(a=\xi_0, \xi_1], (\xi_1, \xi_2], \dots, (\xi_k , b=\xi_{k+1})$.

Then, a local polynomial regression model is given as

\[f(X) = \sum^{K+1}_{i=1} f_{i}(X) \cdot I(\xi_{i-1} < X \le \xi_i)\]

where $f_i$ are polynomials and $\xi_i$ are the knots.

์•ž์—์„œ ์‚ดํŽด๋ณธ <Polynomial Regression>๊ณผ ๋น„๊ตํ–ˆ์„ ๋•Œ, ๋ฒ”์œ„ ์ „์ฒด๋ฅผ ๋„๋ฉ”์ธ์œผ๋กœ ๊ฐ–๋Š” polynomial์„ ์ทจํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ domain interval์„ ๋ถ„ํ• ํ•ด polynomial fitting์„ ํ•œ๋‹ค๋Š” ์ ์ด ๋‹ค๋ฅด๋‹ค.


Example. Constant & Linear

$f_i$๋ฅผ constant function, linear function์œผ๋กœ ๋ชจ๋ธ๋ง ํ–ˆ์„ ๋•Œ์˜ ๊ฒฐ๊ณผ์ด๋‹ค. ๊ทธ๋ฆผ์—์„œ๋„ ๋ณผ ์ˆ˜ ์žˆ๋“ฏ์ด knots ์ฃผ๋ณ€์—์„œ continuous ํ•˜์ง€ ์•Š๋‹ค. ์ด๊ฑธ non-continuous ํ˜„์ƒ์€ order๊ฐ€ ์ปค์ ธ๋„ ์—ฌ์ „ํžˆ ๋ฐœ์ƒํ•œ๋‹ค. ๐Ÿ˜ฅ

<Regression Spline>์€ <Local Polynomial Regression>์— โ€œcontinuous & continuous derivative์— ๋Œ€ํ•œ ์ œ์•ฝโ€์„ ์ฃผ์–ด non-continuous ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•œ๋‹ค!


Regression Spine

Definition. Regression Polynomial Spline

Given a fixed knot sequence $\xi_1, \dots, \xi_K$,, a function defined on an open interval $(a, b)$ is called a <regression (polynomial) spline of order $M$> if it is a piecewise polynomial of order $M$ on each of the intervals.

\[(a=\xi_0, \xi_1], (\xi_1, \xi_2], \dots, (\xi_k , b=\xi_{k+1})\]

and has continuous derivatives up to order $(M-1)$.

์•ž์—์„œ ์‚ดํŽด๋ณธ <Local Polynomial Regression>์˜ not-continuous ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด <Regression Spline>์—์„œ๋Š” piecewise polynomial์ด $(M-1)$์ฐจ ์—ฐ์†์ธ ๋„ํ•จ์ˆ˜๋ฅผ ๊ฐ€์ ธ์•ผ ํ•œ๋‹ค๋Š” ์กฐ๊ฑด์ด ์ถ”๊ฐ€๋œ ๊ฒƒ์ด๋‹ค!!



Example. 3rd-order polynomial ($M=3$)

์™ผ์ชฝ์€ ์•ž์—์„œ ์‚ดํŽด๋ณธ <Local polynomial regression>์˜ ๋ชจ๋ธ์ด๋‹ค. knot์—์„œ not-continuousํ•˜๋‹ค.

์˜ค๋ฅธ์ชฝ์€ knot์—์„œ continuousํ•ด์•ผ ํ•œ๋‹ค๋Š” ์ œ์•ฝ์„ ์ค€ ๋ชจ๋ธ์ด๋‹ค.

์ด๋Š” $x \mapsto a_3 x^3 + a_x x^3 + a_1 x + a_0$์ธ ๊ฐ $f_i\left((\xi_{i-1}, \xi_i]\right)$์— ๋Œ€ํ•ด

\[f_i(\xi_i) = f_{i+1}(\xi_i)\]

๋ผ๋Š” ์กฐ๊ฑด์ด ์ถ”๊ฐ€๋œ ๊ฒƒ์ด๋‹ค. ์ด๋Š” ๊ณง, $f_i(\xi_i) = f_{i+1}(\xi_i)$์ด๊ธฐ ๋•Œ๋ฌธ์—, $f_{i+1}$์—์„œ $b_4, b_3, b_2$์— ๋Œ€ํ•œ ๊ฐ’์ด ์ •ํ•ด์กŒ๋‹ค๋ฉด, ์ž๋™์œผ๋กœ $b_0$์˜ ๊ฐ’์ด ๊ฒฐ์ •๋œ๋‹ค.

\[a_3 x^3 + a_2 x^2 + a_1 x + a_0 = b_3 x^3 + b_2 x^2 + b_1 x + \cancelto{\text{calculated by constraint}}{b_0}\]

๋”ฐ๋ผ์„œ, estimate ํ•ด์•ผ ํ•˜๋Š” Coefficients์˜ ๊ฐฏ์ˆ˜๋Š”

\[(M+1)(K+1) - (K-1)\]

(ex: If $M=3$ and $K=2$, then we need to estimate $4 + 4 - 1$ coefficients.)



Example. 3rd-order polynomial with 1st & 2nd derivative

์ด๋ฒˆ์—๋Š” โ€œknot์—์„œ continuousโ€ ์กฐ๊ฑด๊ณผ ํ•จ๊ป˜ โ€œknot์—์„œ 1st/2nd derivatives continuousโ€ ์กฐ๊ฑด์ด ์ถ”๊ฐ€๋˜์—ˆ๋‹ค.

์ด๋ฅผ ๋งŒ์กฑํ•˜๋ ค๋ฉด, $f_iโ€™ (\xi_i) = f_{i+1}โ€™(\xi_i)$๋ฅผ ๋งŒ์กฑํ•ด์•ผ ํ•˜๋ฉฐ, ์ด๊ฒƒ์€ ๊ณง

\[3 a_3 x^2 + 2 a_2 x + a_1 = 3 b_3 x^2 + 2 b_2 x + \cancelto{\text{calculated by constraint}}{b_1}\]

์ด๋ฏ€๋กœ constraint๋ฅผ ํ†ตํ•ด $b_1$์˜ ๊ฐ’์„ ์ •ํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ๋ง์ด ๋œ๋‹ค. 2nd derivative continuous์— ๋Œ€ํ•ด์„œ๋„ ๋™์ผํ•œ ๋ฐฉ๋ฒ•์œผ๋กœ ์ ‘๊ทผํ•˜๋ฉด ๋œ๋‹ค.

๋”ฐ๋ผ์„œ, estimate ํ•ด์•ผ ํ•˜๋Š” Coefficients์˜ ๊ฐฏ์ˆ˜๋Š”

\[(M+1)(K+1) - MK = M + K + 1\]



Spline basis function

Notation.

\[x_{+} = \begin{cases} x & \text{if} \quad x > 0 \\ 0 & \text{else} \end{cases}\]

๋˜๋Š”

\[(x-\xi)_{+}^M = \begin{cases} (x-\xi)^M & \text{if} \quad x > \xi \\ 0 & \text{else} \end{cases}\]

Example.

๋งŒ์•ฝ $M=1$(linear) and $K=2$๋ผ๋ฉด, spline basis function์€ ์•„๋ž˜์™€ ๊ฐ™๋‹ค.


๋งŒ์•ฝ $M=3$(cubic) and $K=2$๋ผ๋ฉด, spline basis function์€ ์•„๋ž˜์™€ ๊ฐ™๋‹ค.


Natural Cubic Spline

์™„๋ฒฝํ•  ๊ฒƒ ๊ฐ™์€ <Regression Spline> ๋ฐฉ์‹๋„ ์ž‘์€ ๋ฌธ์ œ๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค. ๋ฐ”๋กœ ์–‘๋ boundary์—์„œ regression์ด ์ž˜ ์•ˆ ๋œ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค. ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด <Natural cubic spline>์€ ์–‘๋์—์„œ linear๋กœ ๋ชจ๋ธ๋ง ํ•œ๋‹ค.

Definition. Natural Cubic Spline

A cubic spline is called a <natural cubic spline>, if it is linear beyond the boundary knots $\xi_1$ and $\xi_K$.

(์ถœ์ฒ˜: Stanford - STAT202)

๊ทธ๋ฆผ์—์„œ ๋ณผ ์ˆ˜ ์žˆ๋“ฏ์ด ๋†’์€ ์ฐจ์›์˜ $M$์œผ๋กœ ๊ทผ์‚ฌํ•  ์ˆ˜๋ก <Regression Spline>์€ boundary์—์„œ ์„ฑ๋Šฅ์ด ์ €ํ•˜๋˜๋Š” ๊ฑธ ๋ณผ ์ˆ˜ ์žˆ๋‹ค. <Natural Cubic Spline>์€ ์–‘๋์—์„œ linear๋กœ ๊ทผ์‚ฌํ•จ์œผ๋กœ์จ ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•œ๋‹ค!

<Natural Cubic Spline>์—์„œ estimate ํ•ด์•ผ ํ•˜๋Š” Coefficient ์ˆ˜๋Š” $M=3$์ด๋ฏ€๋กœ

\[(M + K + 1) - (2 \times 2) = (3 + K + 1) - 4 = K\]

Smoothing Splines

<knot selection>์€ Spline Method์˜ ์ฃผ๋œ ์ด์Šˆ์ด๋‹ค. <smoothing spline>์€ ์ด ๋ฌธ์ œ๋ฅผ ์•„๋ž˜์™€ ๊ฐ™์ด ํ•ด๊ฒฐํ•œ๋‹ค!

Consider $\hat{f} = \underset{f\in\mathcal{F}}{\text{argmin}} \; \text{RSS}_{\lambda}(f)$, where

\[\text{RSS}_{\lambda} \; (f) = \left(\sum^n_{i=1} \left\{ y_i - f(x_i)\right\}^2\right) + \lambda \int \left\{ f''(t) \right\}^2 \; dt\]

and $\lambda$ is a fixed smoothing parameter.

์œ„์˜ ์‹์„ ์ž˜ ์‚ดํŽด๋ณด๋ฉด, RSS ์‹์— ํŒจ๋„ํ‹ฐ ํ…€์ด ์žˆ๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค. ์ด๋•Œ, ํŒจ๋„ํ‹ฐ ํ…€์—์„œ๋Š” $fโ€™'(t)$๋ฅผ ์ ๋ถ„ํ•˜๋Š”๋ฐ, ์ด๊ฒƒ์€ ํ•จ์ˆ˜ $f(t)$์— ๋Œ€ํ•œ <๊ณก๋ฅ ; curverture>๋ฅผ ์˜๋ฏธํ•œ๋‹ค. $(fโ€™'(t))^2$๋Š” ๊ณก๋ฅ ์˜ ์ ˆ๋Œ“๊ฐ’์ด๋ฉฐ, ๋”ฐ๋ผ์„œ ํŒจ๋„ํ‹ฐ ํ…€์€ ํ•จ์ˆ˜ $f(t)$๊ฐ€ ์–ผ๋งˆ๋‚˜ flunctuation์ด ์‹ฌํ•œ์ง€๋ฅผ ์ธก์ •ํ•˜๋Š” ํ…€์ด๋ผ๊ณ  ๋ณผ ์ˆ˜ ์žˆ๋‹ค.

์œ„์˜ ์ตœ์ ํ™” ๋ฌธ์ œ์˜ solution์ธ $\hat{f}$๋ฅผ <smoothing spline estimator>๋ผ๊ณ  ํ•œ๋‹ค!

1. If $\lambda = 0$,
then $\hat{f}$ can be any function that interpolates the data.

2. If $\lambda = \infty$,
then $\hat{f}$ is the least square estimator. (์™œ๋ƒํ•˜๋ฉด, $fโ€™'(t)=0$์ด ๋˜๋ ค๋ฉด, $f(t)$๊ฐ€ linear function์ด์–ด์•ผ ํ•œ๋‹ค!)

3. (general case) small $\lambda$
model complexity $\uparrow$ / bias $\downarrow$ / variance $\uparrow$
(= flunctuation์ด ์‹ฌํ•ด์ง!)

4. (general case) large $\lambda$
model complexity $\downarrow$ / bias $\uparrow$ / variance $\downarrow$

๋งŒ์•ฝ $\mathcal{F}$๊ฐ€ ํŠน์ • <Sobolev space>์— ์†ํ•˜๋Š” ์–ด๋–ค ํ•จ์ˆ˜๋ผ๋ฉด, smooth spline์ด ๊ณง natural cubic spline์ด ๋œ๋‹ค๊ณ  ํ•œ๋‹ค.
ESL, Exercise 5.7


์•ž์—์„œ ์‚ดํŽด๋ณธ Smoothing Spline์— ๋Œ€ํ•œ ์‹์€ infinite-dimensional problem์ด์—ˆ๋‹ค ์ด๊ฒƒ์„ linear fit ๋ฐฉ์‹์œผ๋กœ ํ•ด๊ฒฐํ•  ์ˆ˜๋„ ์žˆ๋Š”๋ฐ, ๊ทธ ์‹์€ ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

\[f(\theta; x) = \sum^n_{j=1} \theta_j N_j (x)\]

where $N_j$ are basis functions of natural cubic splines.

Then,

\[\text{RSS}_{\lambda} \; (\theta) = (\mathbf{y} - \mathbf{N}\theta)^T (\mathbf{y} - \mathbf{N}\theta) + \lambda \cdot \theta^T \mathbf{\Omega} \theta\]

where \(\mathbf{N}_{ij} = N_j(x_i)\) and $\displaystyle\mathbf{\Omega}_{ij} = \int {N_i}โ€™' (t) {N_j}โ€™'(t) \; dt$.

์‹์ด ์กฐ๊ธˆ ๋ณต์žกํ•ด๋ณด์ด์ง€๋งŒ, ์œ„์—์„œ ์ œ์‹œํ•œ $f(\theta; x)$๋ฅผ smoothing spline์˜ ์‹์— ๋งž๊ฒŒ ๊ธฐ์ˆ ํ•œ ๊ฒƒ์ผ ๋ฟ์ด๋‹ค!

์œ„์˜ $\text{RSS}_{\lambda}(\theta)$๋ฅผ ๊ตฌํ•˜๋Š” ๊ฒƒ์€ ๊ทธ๋ƒฅ $\theta$๋กœ ๋ฏธ๋ถ„ํ•ด์„œ ์ตœ์†Œ๊ฐ€ ๋˜๋Š” $\theta$๋ฅผ ๊ตฌํ•ด์ฃผ๋ฉด ๋œ๋‹ค.

\[\hat{f}(x) = \sum^n_{j=1} \hat{\theta}_j N_j(x)\]

where

\[\hat{\theta} = (\mathbf{N}^T \mathbf{N} + \lambda \mathbf{\Omega})^{-1}\mathbf{N}^T \mathbf{y}\]

๋‚ด๋ถ€์— regularization ํ…€์œผ๋กœ ์ธํ•ด $\lambda \mathbf{\Omega}$๊ฐ€ ์ƒ๊ฒผ๋‹ค!



์ด์–ด์ง€๋Š” ํฌ์ŠคํŠธ์—์„œ๋Š” spline method์˜ ๋‚จ์€ ๋‚ด์šฉ์„ ์ข€๋” ์‚ดํŽด๋ณธ๋‹ค. ๐Ÿคฉ

๐Ÿ‘‰ Spline Method (2)