์„ ํ–‰ ๊ฐœ๋…

6 minute read

์„ ํ–‰ ๊ฐœ๋…

ํŽผ์ณ๋ณด๊ธฐ

Definition. Variance

\[\text{Var}(X) = \sum_i^N \frac{(X_i - \bar{X})^2}{N}\]

Definition. Covariance

\[\text{Cov}(X, Y) = \sum_i^N \frac{(X_i - \bar{X})(Y_i - \bar{Y})}{N}\]

Definition. Correlation

\[\text{Corr}(X, Y) = \frac{\text{Cov}(X, Y)}{\sqrt{\text{Var}(X)} \sqrt{\text{Var}(Y)}}\]
  • Correlation์€ $\left[ -1, 1 \right]$์˜ ๋ฒ”์œ„๋ฅผ ๊ฐ–๋Š”๋‹ค.
  • ์œ„์™€ ๊ฐ™์€ Correlation์„ Pearson Correlation $r_{XY}$๋ผ๊ณ  ํ•œ๋‹ค.

Definition. Partial Correlation

\[\rho_{XY\cdot \mathbf{z}} = \text{Cor}(e_{X}, e_{Y})\]

where $e_{X}$ and $e_{Y}$ are residual of multiple regression fitting on $\mathbf{z}$.

Auto-Correlation์ด๋ž€?

์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ $\{ s(t) \}$์—์„œ $s(t)$๋Š” ์ด์ „์˜ ํƒ€์ž„ ์Šคํ…์˜ $s(t-1)$, $s(t-2)$ ๊ฐ’์—์„œ ๊ฐ‘์ž๊ธฐ ํฌ๊ฒŒ ์ƒ์Šนํ•œ๋‹ค๊ฑฐ๋‚˜, ๊ฐ‘์ž๊ธฐ ํฌ๊ฒŒ ํ•˜๋ฝํ•˜๋Š” ์ผ์€ ํ”ํ•˜์ง€ ์•Š๋‹ค.

Correlation $\text{Corr}(X, Y)$์€ ๋ณธ๋ž˜ ์„œ๋กœ ๋‹ค๋ฅธ ๋‘ Random Variable $X$, $Y$์˜ ์ƒ๊ด€์„ฑ์„ ๋ณด๊ธฐ ์œ„ํ•œ ์ง€ํ‘œ์ด๋‹ค. ๊ทธ๋Ÿฐ๋ฐ, ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ์—์„  ์ž์‹ ๊ณผ ์ด์ „์˜ ๊ฐ’ ์‚ฌ์ด์— ์ƒ๊ด€์„ฑ์„ ๋ณด๊ธฐ ์œ„ํ•ด Auto-Correlation $\text{Corr}(s(t), s(t-1))$๋ฅผ ๊ตฌํ•œ๋‹ค.

\[\text{Corr}(s(t), s(t-1)) = \frac{\text{Cov}(s(t), s(t-1))}{\sqrt{\text{Var}(s(t))} \sqrt{\text{Var}(s(t-1))}} = \frac{\text{Cov}(s(t), s(t-1))}{\text{Var}(s(t))}\]

Auto-Correlation์˜ ์ˆ˜์‹๊ณผ ์ปจ์…‰์€ ๋ณ„๋กœ ์–ด๋ ต์ง€ ์•Š๋‹ค. ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ๋ฅผ $t$์™€ $t-1$๋กœ ํ‘œํ˜„ํ•˜๋ฉด ์•„๋ž˜์™€ ๊ฐ™์€๋ฐ,

$t$ $s(t)$ $s(t-1)$
1 11 10
2 12 11
3 14 12
4 16 14
5 20 16

์‹œ๊ณ„์—ด $s(t)$๊ฐ€ ์œ„์™€ ๊ฐ™์€ ํŒจํ„ด์„ ๋ณด์ธ๋‹ค๋ฉด, Auto-Corrrelation $\text{Corr}(s(t), s(t-1))$๋Š” ์–‘(+)์˜ ๋ถ€ํ˜ธ๋ฅผ ๊ฐ€์งˆ ๊ฒƒ์ด๋‹ค.

Auto-Correlation ์ˆ˜์‹์„ ์ข€๋” ์ผ๋ฐ˜ํ™”ํ•ด์„œ Auto-Correlation Function, ACF๋กœ ํ‘œํ˜„ํ•˜๊ธฐ๋„ ํ•œ๋‹ค. ์ด์ „ $k$ ์Šคํ…๊ณผ์˜ ์ƒ๊ด€์„ฑ์„ ๋ณด๊ธฐ ์œ„ํ•œ ACF $\text{ACF}(k)$๋Š” ์•„๋ž˜์™€ ๊ฐ™์ด ์ •์˜ํ•œ๋‹ค.

\[\text{ACF}(k) = \frac{\text{Cov}(s(t), s(t-k))}{\text{Var}(s(t))}\]

Example: goog200

goog200์ด๋ผ๋Š” ์‹œ๊ณ„์—ด ์ฃผ๊ฐ€ ๋ฐ์ดํ„ฐ์—์„œ ACF๋ฅผ ๊ตฌํ•ด๋ณด๋ฉด ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

์ „๋ฐ˜์ ์œผ๋กœ ์ด์ „์˜ ๋ฐ์ดํ„ฐ์— ๋†’์€ ์–‘(+)์˜ ์ƒ๊ด€์„ฑ์„ ๋ณด์ด๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค.

Partial ACF

$\text{ACF}(k)$๋Š” $s(t)$์™€ $s(t-k)$, ๋‘ ๊ฐ’์˜ ์ƒ๊ด€์„ฑ์„ ์ถœ๋ ฅํ•œ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๊ทธ ์‚ฌ์ด์— ์žˆ๋Š” $s(t-1)$๋ถ€ํ„ฐ $s(t-(k-1))$์˜ ์˜ํ–ฅ๋ ฅ์ด ์กด์žฌํ•˜์ง€ ์•Š์•˜์„๊นŒ? ๐Ÿค”

$s(t)$์™€ $s(t-1)$๊ฐ€ ์ƒ๊ด€์„ฑ์ด ์žˆ๋‹ค๋ฉด, $s(t-1)$์™€ $s(t-2)$๋„ ์ƒ๊ด€์„ฑ์ด ์žˆ์„ ๊ฒƒ์ด๋‹ค. ๊ทธ๋ ‡๋‹ค๋ฉด, $s(t)$์™€ $s(t-2)$๋„ ์ƒ๊ด€์„ฑ์ด ์žˆ์„ ๊ฒƒ์ด๋ผ๋Š”๊ฒŒ ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ์œ ๋„๋œ๋‹ค.


<Partial Correlation>๋ผ๋Š” ๊ฐœ๋…์ด ์žˆ๋‹ค. ์ž์„ธํ•œ ๋‚ด์šฉ์€ โ€œPartial Correlationโ€ ํฌ์ŠคํŠธ์— ์ ์–ด๋’€์ง€๋งŒ, ๊ฐ„๋‹จํžˆ ๋งํ•ด๋ณด์ž๋ฉด. ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๋…๋ฆฝ๋ณ€์ˆ˜๊ฐ€ ์žˆ๊ณ , ๊ฐ ๋…๋ฆฝ๋ณ€์ˆ˜ ์‚ฌ์ด์— ์–ด๋Š์ •๋„์˜ Correlation์ด ์žˆ์„ ๋•Œ, ๊ทธ๋Ÿฐ ๋…๋ฆฝ๋ณ€์ˆ˜ ์‚ฌ์ด์˜ ์ƒ๊ด€์„ฑ์„ ๋ฐฐ์ œํ•˜๊ณ  ์˜ค์ง ๋…๋ฆฝ๋ณ€์ˆ˜ ๋‹จ๋…์˜ ์ข…์†๋ณ€์ˆ˜์— ๋Œ€ํ•œ Correlation์„ ์ธก์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค.


Partial ACF $\text{PACF}(k)$ ์—ญ์‹œ $s(t)$์™€ $s(t-k)$์˜ ์ƒ๊ด€์„ฑ์„ ์ธก์ •ํ•œ๋‹ค๋Š” ๊ฒƒ์€ $\text{ACF}(k)$์™€ ๋™์ผํ•˜๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ $\text{PACF}(k)$๋Š” $s(t)$์™€ $s(t-k)$ ์‚ฌ์ด์˜ $s(t-1)$๋ถ€ํ„ฐ $s(t-(k-1))$์˜ ์˜ํ–ฅ์„ ๋ฐฐ์ œํ•˜๊ณ  ์ƒ๊ด€์„ฑ์„ ์ธก์ •ํ•œ๋‹ค!

Example

๋‹ค์‹œ goog200 ์‹œ๊ณ„์—ด ์ฃผ๊ฐ€ ๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•ด PACF ๊ทธ๋ž˜ํ”„๋ฅผ ๊ทธ๋ ค๋ณด์ž.

์ด๋ฒˆ์—๋Š” ACF์™€ ๋‹ค๋ฅด๊ฒŒ, $\text{PACF}(1)$์—์„œ๋งŒ ํฐ ์ƒ๊ด€์„ฑ์„ ๋ณด์˜€๋‹ค. ์ด๊ฒƒ์„ ํ†ตํ•ด $S(t)$์™€ $S(t-1)$๊ฐ€ ์ข€๋” ์œ ์˜ํ•œ ์ƒ๊ด€๊ด€๊ณ„๊ฐ€ ์žˆ์„ ๊ฑฐ๋ผ๊ณ  ์ƒ๊ฐํ•  ์ˆ˜ ์žˆ๋‹ค! ๐Ÿ˜€

Derivation

PACF๋ฅผ ์œ ๋„ํ•˜๋Š” ๊ฒƒ์€ Partial Correlation $\rho_{XY\cdot Z}$๋ฅผ ์œ ๋„ํ•˜๋Š” ๊ฒƒ๊ณผ ๋™์ผํ•˜๋‹ค. ์˜ํ–ฅ์„ ๋ฐฐ์ œํ•˜๊ณ ์ž ํ•˜๋Š” ๋…๋ฆฝ๋ณ€์ˆ˜์— ๋Œ€ํ•ด Linear Regression Fitting์„ ํ•˜๊ณ , ์ž”์ฐจ(residual)์— ๋Œ€ํ•ด Correlation์„ ๊ตฌํ•ด์ฃผ๋ฉด ๋œ๋‹ค! ๐Ÿ‘

Simple Case

์šฐ์„  ๊ฐ„๋‹จํ•œ $k=2$์ธ ๊ฒฝ์šฐ๋ถ€ํ„ฐ ์œ ๋„ํ•ด๋ณด์ž. ์šฐ๋ฆฌ๋Š” $\text{PACF}(2)$, ์ฆ‰ $s(t)$์™€ $s(t-2)$์˜ Partial Auto-Correlation์„ ๊ตฌํ•˜๊ณ ์ž ํ•œ๋‹ค.

๋จผ์ € ์•„๋ž˜์™€ ๊ฐ™์ด Linear Regression Fitting์„ ํ•œ๋‹ค.

\[\begin{aligned} w^{\ast}_{s(t)} &= \underset{w}{\text{argmin}} \left\{ \sum_{i} = (s(i) - w \cdot s(i-1))^2 \right\} \\ w^{\ast}_{s(t-2)} &= \underset{w}{\text{argmin}} \left\{ \sum_{i} = (s(i-2) - w \cdot s(i-1))^2 \right\} \end{aligned}\]

๊ฐœ์ธ์ ์œผ๋กœ PACF์˜ ์‹์„ ์ดํ•ดํ•˜๋ ค๊ณ  ํ•  ๋•Œ, ์ดํ•ด๊ฐ€ ์•ˆ ๋˜๋Š” ๋ถ€๋ถ„์ด $s(t)$์—์„œ๋Š” $s(t-1)$์— ๋Œ€ํ•ด Fitting ํ–ˆ๋Š”๋ฐ, $s(t-2)$์—์„œ ๊ทธ ์ด์ „ ์Šคํ…์ธ $s(t-1)$๋กœ Fitting ํ•˜๋Š” ๊ฒƒ์ด์—ˆ๋‹ค. $s(t-2)$์™€ $s(t-3)$๋กœ Fitting ํ•˜๋Š”๊ฒŒ ์•„๋‹ˆ๋ผ ๋ง์ด๋‹ค!

์‚ฌ์‹ค ์ด๊ฑด <Partial Correlation>์˜ ์ •์˜๋ฅผ ์ดํ•ดํ•˜๋ฉด์„œ ํ•ด์†Œ๋˜์—ˆ๋‹ค. $s(t)$์™€ $s(t-2)$์˜ Partial Correlation์„ ๊ตฌํ•˜๊ธฐ ์œ„ํ•ด ๊ทธ ์‚ฌ์ด์˜ ๋‹ค๋ฅธ ๋…๋ฆฝ๋ณ€์ˆ˜์ธ $s(t-1)$์˜ ์˜ํ–ฅ์„ ๋ฐฐ์ œํ•˜๋Š” ๊ณผ์ •์ด๋ฏ€๋กœ, ๋‘ ๋ณ€์ˆ˜์—์„œ $s(t-1)$์— ๋Œ€ํ•ด Fitting ํ•˜๋Š” ๊ฒƒ์ด ๋งž๋‹ค! ๐Ÿ˜€

์ด์ œ ์ž”์ฐจ(residual)์„ ๊ตฌํ•˜๋ฉด,

\[\begin{aligned} e_{s(t), i} &= s(i) - w^{\ast}_{s(t)} \cdot s(i-1) \\ e_{s(t-2), i} &= s(i-2) - w^{\ast}_{s(t-2)} \cdot s(i-1) \end{aligned}\]

๋งˆ์ง€๋ง‰์œผ๋กœ ์ž”์ฐจ์— ๋Œ€ํ•œ Correlation์„ ๊ตฌํ•ด์ฃผ๋ฉด ๋œ๋‹ค.

\[\text{PACF}(k) = \text{Cor} \left(e_{s(t)}, e_{s(t-2)} \right)\]

Generalization

์ด์ œ ์ผ๋ฐ˜์ ์ธ $\text{PACF}(k)$์˜ ์ˆ˜์‹์— ๋Œ€ํ•ด ์œ ๋„ํ•ด๋ณด์ž. ์ด์ œ๋Š” <Partial Correlation>์ด๋ผ๋Š” ๋ฐฉ์‹์— ์ต์ˆ™ํ•˜๋ฆฌ๋ผ ๋ฏฟ๊ณ , ๋ฐ”๋กœ ์ˆ˜์‹์„ ์จ๋ณด๊ฒ ๋‹ค.

Definition. Partial ACF

\[\text{PACF}(k) = \text{Cor}(s(t) - \hat{s(t)}, \; s(t - k) - \hat{s(t)})\]

where $\hat{s(t)}$ is a linear combination of $\left\{ s(t-1), s(t-2), โ€ฆ, s(t-(k-1))\right\}$ that minimize the mean squared error of $s(t)$ and $s(t-k)$ respectively.


๋งบ์Œ๋ง

ACF์™€ PACF๋Š” ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ๋ฅผ EDA ํ•˜๋Š” ๊ณผ์ •์—์„œ ์‚ฌ์šฉํ•˜๋Š” ๊ธฐ๋ฒ• ์ค‘ ํ•˜๋‚˜๋‹ค. ACF, PACF ๊ทธ๋ž˜ํ”„๋ฅผ ๋ณด๊ณ , ์–ด๋–ค ์‹œ๊ณ„์—ด ๋ชจ๋ธ์„ ์“ธ์ง€ ๊ฒฐ์ •ํ•˜๊ฒŒ ๋œ๋‹ค.

ACF, PACF๋ฅผ ์ œ๋Œ€๋กœ ์“ฐ๋ ค๋ฉด ์–ด๋–ค ์‹œ๊ณ„์—ด ๋ชจ๋ธ๋“ค์ด ์žˆ๋Š”์ง€๋ฅผ ๋จผ์ € ์•Œ์•„์•ผ ํ•œ๋‹ค. ์•„๋ž˜์˜ ๋ชจ๋ธ๋“ค์„ ๋จผ์ € ๊ณต๋ถ€ํ•˜๊ณ  ์˜ค์ž.

  • AR(Auto-Regressive) Model
  • MA(Moving Average) Model
  • ARMA Model

Reference