PCA
2021-1ํ๊ธฐ, ๋ํ์์ โํต๊ณ์ ๋ฐ์ดํฐ๋ง์ด๋โ ์์ ์ ๋ฃ๊ณ ๊ณต๋ถํ ๋ฐ๋ฅผ ์ ๋ฆฌํ ๊ธ์ ๋๋ค. ์ง์ ์ ์ธ์ ๋ ํ์์ ๋๋ค :)
Introduction to PCA
<PCA; Principal Component Analysis>๋ ์ฐจ์์ถ์(dimensionality reduction)์ ์ฃผ์ํ ๊ธฐ๋ฒ์ผ๋ก ์ฌ์ฉ๋๋ ํ ํฌ๋์ด๋ค.
โจ Goal โจ
To find low-dimensional representation of input variables which contains most information in data
Image from โstackoverflowโ
์ด๋, โcontains most informationโ์ด๋ ๋ฌด์จ ์๋ฏธ์ผ๊น? <PCA>์์๋ ์ด๊ฒ์ ๋ฐ์ดํฐ์ โ๋ถ์ฐ(variance)โ์ ์ต๋ํ ๋ณด์กดํ๋ ๊ฒ์ด๋ผ๊ณ ๋งํ๋ค! ์ด๋ ๊ฒ ๋ฐ์ดํฐ์ ๋ถ์ฐ์ ๋ณด์กดํ๋ ๋ฐฉํฅ์ <principal component direction>๋ผ๊ณ ํ๋ฉฐ, ์ฐ๋ฆฌ๊ฐ <PCA>๋ฅผ ์ํํ๊ธฐ ์ํด ์ฐพ์์ผ ํ๋ ๋์์ด๋ค! ๐
First Principal Component
For a design matrix $X = \left[ X_1, X_2, \dots, X_p \right]$, the first principal component $Z_1$ is the normalized linear combination:
\[Z_1 = \phi_{11} X_1 + \cdots + \phi_{1p}X_p\]that has the largest variance.
by normalization, $\sum^p_{j=1} \phi_{1j}^2 = 1$; normalization์ ํ๊ธฐ ๋๋ฌธ์, principal component์ direction์ด uniqueํ๊ฒ ์ ์๋๋ค.
The coefficient vector $\phi_1 = (\phi_{11}, \dots, \phi_{1p})^T$ is called the โloading vector of the first principal componentโ.
Q. Given a $n\times p$ design matrix $X$, how can we estimate $\phi_1$??
Let $X = (X_1, \dots, X_p)^T$, and w.o.l.g. letโs assume that $E[X] = 0$ by standardization and $\text{Var}(X) = \Sigma$.
Then, for $Z_1 = \phi_{11}X_1 + \cdots + \phi_{1p}X_p$, we want to maximuze $\text{Var}(Z_1)$!
\[\begin{aligned} \text{maximize} \; \text{Var}(Z_1) &= \underset{\phi_{11}, \dots, \phi_{1p}}{\text{maximize}} \; \text{Var}(\phi_{11}X_1 + \cdots + \phi_{1p}X_p) \\ &= \underset{\phi_{11}, \dots, \phi_{1p}}{\text{maximize}} \; \text{Var}(\phi_1^T X) \\ &= \underset{\phi_{11}, \dots, \phi_{1p}}{\text{maximize}} \; \phi_1^T \text{Var}(X) \phi_1 \\ &= \underset{\phi_{11}, \dots, \phi_{1p}}{\text{maximize}} \; \phi_1^T \Sigma \phi_1 \quad \text{s.t.} \quad \phi_{11}^2 + \cdots \phi_{1p}^2 = 1 \end{aligned}\]์ด๋, covariance matrix $\Sigma$๋ symmetric matrix์ด๊ธฐ ๋๋ฌธ์ <spectral decomposition>์ด ๊ฐ๋ฅํ๋ค!
\[\Sigma = UDU^T\]๋, covariance matrix $\Sigma$๋ positive definite matrix์ด๋ค.
\[x^T \Sigma x \succ 0\]์ด์ ๋ฐ๋ผ ๋ชจ๋ egiven value๊ฐ positive๋ค.
\[\Sigma = UDU^T = d_1 u_1 u_1^T + \cdots + d_p u_p u_p^T\]w.o.l.g. $0 \le d_1 \le \cdots \le d_p$.
์ด์ covariance matrix $\Sigma$์ ๋ํด์ ์ถฉ๋ถํ ์ดํด๋ดค๊ณ , ์ด $\Sigma$๋ก $\phi$๋ฅผ ํํํด๋ณด์. ์ฐ๋ฆฌ๋ $\Sigma$์ Orthogomal matrix $U$๋ฅผ ์ฌ์ฉํด ์๋์ ๊ฐ์ด $\phi$๋ฅผ $U$-basis expansion ํ ์ ์๋ค.
\[\phi = (u_1^T\phi) u_1 + \cdots (u_p^T \phi) u_p = U \begin{pmatrix} u_1^T \phi \\ \vdots \\ u_p^T \phi \end{pmatrix}\]์ด๋, $\phi^T \phi = 1$์ด๊ธฐ ๋๋ฌธ์ $(u_1^T\phi)^2 + \cdots (u_p^T \phi)^2 = 1$์ด๋ค.
\[\begin{aligned} \phi^T \Sigma \phi &= (u_1^T \phi, \, \dots, \, u_p^T \phi) \cancel{U^T} \cdot \cancel{U}D\cancel{U^T} \cdot \cancel{U} \begin{pmatrix} u_1^T \phi \\ \vdots \\ u_p^T \phi \end{pmatrix} \\ &= (u_1^T \phi, \, \dots, \, u_p^T \phi) D \begin{pmatrix} u_1^T \phi \\ \vdots \\ u_p^T \phi \end{pmatrix} \\ &= d_1 (u_1^T \phi)^2 + \cdots + d_p (u_p^T \phi)^2 \end{aligned}\]์ฆ, maximum variance๋ฅผ ๋ฌ์ฑํ๊ธฐ ์ํด ์ฐ๋ฆฌ๋ $\phi^T\Sigma \phi$๋ฅผ ์ต๋๋ก ๋ง๋ค์ด์ผ ํ๋ค!
์ด๋, covariance matrix $\Sigma$์ eigen value $d_i$๋ ํต์ ํ ์ ์๋ ๊ฐ์ด ์๋๋ค! ๋์ ๊ฐ์ฅ ํฐ eigen value์ด $d_p$์ ๊ณฑํด์ง๋ $(u_p^T \phi)^2$์ ๊ฐ์ ์กฐ์ ํ ์ ์๋ค!!
์ด๋, $(u_p^T \phi)^2$์ ๊ฐ์ด ๊ฐ์ฅ ์ปค์ง๋ ๊ฒฝ์ฐ๋ ๋ฐ๋ก $\phi$๊ฐ $u_p$๊ฐ ๋ ๋์ด๋ค!! ($u_p^T u_p = 1$)
์ฆ, the constraint maximization problem for $\phi_1$์ solution์ $u_p$, the eigen vector associated with the largest eigen value $d_p$์ด๋ค!
์์์ ์ดํด๋ณธ ๊ณผ์ ์ ๋ค์ formalํ๊ฒ ๊ธฐ์ ํด๋ณด์!
Definition. First Principal Component
Since we are only considering variance, we may assume that each column of $X$ is centered. (์ปฌ๋ผ ํ๊ท ์ด 0)
Let define $z_{i1}$ to be
\[z_{i1} = \phi_{11} x_{i1} + \cdots + \phi_{1p}x_{ip} = (X\phi_1)_i\]The each $z_{i1}$ are called โthe scores of the first principal componentโ.
Then, we can find an estimator $\hat{\phi}_1$ of $\phi_1$ by solving the following constraint maximizing problem.
\[\underset{\phi_1}{\text{maximize}} \left\{ \frac{1}{n} \sum^n_{i=1} z_{i1}^2 \right\} \quad \text{subject to} \; \sum^p_{j=1} \phi_{1j}^2 = 1\]Equivalently, $\phi_1$ can be obtained by
\[\underset{\phi_1}{\text{maximize}} \; \phi_1^T \hat{\Sigma} \phi_1 \quad \text{subject to} \; \sum^p_{j=1} \phi_{1j}^2 = 1\]where $\hat{\Sigma} = X^T X / n$, the sample covariance matrix.
Therefore, $\hat{\phi_1}$ is the eigen vector of $\hat{\Sigma}$ associated with the largest eigen value!
Second and more Principal Components
The <second principal component> $Z_2$ is the normalized linear combination of $X_1, \dots, X_p$ that has maximal variance out of all normalized linear combination that are uncorrelated with $Z_1$.
์ด๋, <second principal component>๋ $Z_1$์ ๊ตฌํ ๋์ฒ๋ผ ์์ฃผ ์ฝ๊ฒ ๊ตฌํ ์ ์๋ค. ๊ทธ๋ฅ second largest eigen value์ ์ฐ๊ด๋ eigen vector๊ฐ $Z_2$๊ฐ ๋๋ค!! (์๋ eigen vector๋ผ๋ฆฌ๋ ์๋ก orthogonalํ๋ค.)
$k$-th principal component์ ๋ํด์๋ ๊ทธ๋ฅ $k$๋ฒ์งธ eigen value์ eigen vector๋ฅผ ์ฐ๋ฉด ๋๋ค!!
PCA Metric
<PCA>์์๋ <PVE; Proportion of Variance Explained> of the m-th principal component๋ผ๋ metric๋ฅผ ์ ์ํด ์ฌ์ฉํ๋ค.
\[\frac{\sum^n_{i=1} z_{im}^2}{\sum^p_{j=1} \sum^n_{i=1} x_{ij}^2} = \frac{\sum^n_{i=1} z_{im}^2 / n}{\sum^p_{j=1} \sum^n_{i=1} x_{ij}^2 / n} = \frac{\text{Var}(Z_m)}{\sum^p_{j=1} \text{Var}(X_j)}\]์ด๋, <PVE> ๊ฐ์ ๋ณด๊ณ , ๋ช๊ฐ์ principal component๋ฅผ ์ฌ์ฉํ ๊ฒ์ธ์ง ๊ฒฐ์ ํ๋ค!