2021-1ν•™κΈ°, λŒ€ν•™μ—μ„œ β€˜ν†΅κ³„μ  λ°μ΄ν„°λ§ˆμ΄λ‹β€™ μˆ˜μ—…μ„ λ“£κ³  κ³΅λΆ€ν•œ λ°”λ₯Ό μ •λ¦¬ν•œ κΈ€μž…λ‹ˆλ‹€. 지적은 μ–Έμ œλ‚˜ ν™˜μ˜μž…λ‹ˆλ‹€ :)

11 minute read

2021-1ν•™κΈ°, λŒ€ν•™μ—μ„œ β€˜ν†΅κ³„μ  λ°μ΄ν„°λ§ˆμ΄λ‹β€™ μˆ˜μ—…μ„ λ“£κ³  κ³΅λΆ€ν•œ λ°”λ₯Ό μ •λ¦¬ν•œ κΈ€μž…λ‹ˆλ‹€. 지적은 μ–Έμ œλ‚˜ ν™˜μ˜μž…λ‹ˆλ‹€ :)

Set-up

μ–΄λ–€ ν–‰λ ¬ $A$에 λŒ€ν•΄ κ·Έ ν–‰λ ¬μ˜ 제곱근 행렬을 찾을 수 μžˆμ„κΉŒ? κ·ΈλŸ¬λ‹ˆκΉŒ μ–΄λ–€ ν–‰λ ¬ $B$κ°€ μžˆμ–΄ $B B = A$κ°€ λ˜λŠ” 그런 행렬을 찾을 수 μžˆλŠ”μ§€μ— λŒ€ν•œ μ§ˆλ¬Έμ΄λ‹€.

μš°μ„  행렬이 μ œκ³±κ·Όμ„ κ°€μ§€λŠ” 쒋은 μ„±μ§ˆμ„ 가지렀면, ν–‰λ ¬ $A$κ°€ symmetric matrixκ°€ λ˜μ–΄μ•Ό ν•  것이닀; $A \in \mathbb{R}^{n\times n}$.

μ‹€μˆ˜ μ˜μ—­μ—μ„œλŠ” μ–΄λ–€ 수 $x$κ°€ μ œκ³±κ·Όμ„ 가지렀면, κ·Έ μˆ˜κ°€ $x \ge 0$ μ—¬μ•Ό ν–ˆλ‹€. (λ³΅μ†Œ μ œκ³±κ·Όλ„ μžˆμ§€λ§Œ, μ—¬κΈ°μ„œλŠ” μ‹€μˆ˜ μ˜μ—­ μœ„μ— μ •μ˜λœ 제곱근 연산을 μƒκ°ν•˜μž.) 즉, μ œκ³±κ·Όμ„ μ •μ˜ν•˜κΈ° μœ„ν•΄μ„  non-negative”에 λŒ€ν•œ κ°œλ…μ„ ν–‰λ ¬μ˜ μ˜μ—­μœΌλ‘œ λŒμ–΄μ˜¬λ €μ•Ό ν•œλ‹€. 그런 μ μ—μ„œ μ΄λ²ˆμ— μ‚΄νŽ΄λ³Ό <Nonnegative Definite>λŠ” β€œnon-negative”λ₯Ό ν™•μž₯ν•œ 것이라고 μƒκ°ν•˜λ©΄ μ’‹λ‹€.

Nonnegative Definite Matrices

Theorem.

For a symmetric matrix $A \in \mathbb{R}^{n \times n}$, T.F.A.E.

(1) $A$ is <non-negative definite>, denoted $A \succeq 0$:

\[\mathbf{x}^T A \mathbf{x} \ge 0 \quad \text{for every} \quad \mathbf{x} \in \mathbb{R}^{n}\]

(2) All eigenvalues of $A$ are non-negative.

(3) $A = B^T B$ for some $B$.

(4) $A$ is variance-covariance matrix of some randome variable.


<Nonnegative Definite>의 의미λ₯Ό 쒀더 μ‚΄νŽ΄λ³΄μž. $A$κ°€ symmetric matrixμ΄λ―€λ‘œ <spectral decomposition>이 κ°€λŠ₯ν•˜λ‹€; $A = UDU^T$

\[\begin{aligned} A = UDU^T = d_1 \mathbf{u}_1 \mathbf{u}_1^T + \cdots + d_n \mathbf{u}_n \mathbf{u}_n^T \end{aligned}\]

여기에 μ’Œμš°μ— 벑터 $\mathbf{x}$λ₯Ό κ³±ν•΄μ£Όμž.

\[\begin{aligned} \mathbf{x}^T A \mathbf{x} &= \mathbf{x}^T \left( d_1 \mathbf{u}_1 \mathbf{u}_1^T + \cdots + d_n \mathbf{u}_n \mathbf{u}_n^T \right) \mathbf{x} \\ &= d_1 (\mathbf{u}_1^T \mathbf{x})^2 + \cdots + d_n (\mathbf{u}_n^T \mathbf{x})^2 \quad (\because \mathbf{x}^T \mathbf{u}_i {\mathbf{u}_i}^T \mathbf{x} = \mathbf{x}^T \mathbf{u}_i \cdot {\mathbf{u}_i}^T \mathbf{x} ) \end{aligned}\]

λ§Œμ•½ $\mathbf{x}^T A \mathbf{x} \ge 0$이 되기 μœ„ν•΄μ„  μ–΄λ–€ 쑰건이 ν•„μš”ν• κΉŒ? μš°μ„  μ‹μ˜ μš°λ³€μ—μ„œ $(\mathbf{u}_1^T \mathbf{x})^2 \ge 0$이 μžˆλ‹€λŠ” 것에 μ£Όλͺ©ν•˜μž. λ”°λΌμ„œ, λ§Œμ•½ λͺ¨λ“  $d_i$κ°€ non-negative라면, μš°λ³€μ€ λ‹Ήμ—°νžˆ non-negativeκ°€ 될 것이닀! μ΄κ²ƒμœΌλ‘œ ($\impliedby$) λ°©ν–₯을 증λͺ…ν–ˆλ‹€.

($\implies$) λ°©ν–₯을 증λͺ…ν•˜λ €λ©΄, $\mathbf{x} = \mathbf{u}_i$둜 섀정해보면 λœλ‹€.

\[{\mathbf{u}_i}^T A \mathbf{u}_i = d_i ({\mathbf{u}_i}^T \mathbf{u}_i)^2 \ge 0\]

μœ„μ˜ 뢀등식을 λ§Œμ‘±ν•˜κΈ° μœ„ν•΄μ„œλŠ” $d_i \ge 0$이 λ˜μ–΄μ•Ό ν•œλ‹€.

$\blacksquare$


Positive Definite Matrices

Theorem.

For a symmetric matrix $A \in \mathbb{R}^{n \times n}$, T.F.A.E.

(1) $A$ is <positive definite>, denoted $A \succ 0$:

\[\mathbf{x}^T A \mathbf{x} > 0 \quad \text{for every} \quad \mathbf{x} \in \mathbb{R}^{n} \setminus \{\mathbf{0}\}\]

1

(2) All eigenvalues of $A$ are positive.

(3) $A = B^T B$ for some non-singular $B$.

(4) $A$ is non-negative definite and non-singular.

(5) There exist linearly independent vectors $\mathbf{u}_1, \cdots, \mathbf{u}_n \in \mathbb{R}^n$ s.t. \(\displaystyle A = \sum^n_{j=1} {\mathbf{u}_j}^T \mathbf{u}_j\)


(4)번 λͺ…μ œλ₯Ό 쒀더 μ‚΄νŽ΄λ³΄μž. ν–‰λ ¬ $A$κ°€ SDP(symmetric and positive definite)라면, μš°λ¦¬κ°€ μ–΄λ–»κ²Œ $A^{1/2}$λ₯Ό 찾을 수 μžˆμ„κΉŒ? 닡은 μ—­μ‹œ <Spectral Decomposition>에 μžˆλ‹€!!

<Spectral Decomposition>에 μ˜ν•΄ ν–‰λ ¬ $A$λŠ” μ•„λž˜μ™€ 같이 λΆ„ν•΄λœλ‹€.

\[A = UDU^T\]

λ§Œμ•½ μ΄λ•Œ $A^2$을 κ΅¬ν•œλ‹€λ©΄,

\[A^2 = A \cdot A = (UDU^T) (UDU^T) = UD^2 D^T\]

즉, orthogonal matrix $U$λΌλŠ” 쒋은 녀석이 있기 λ•Œλ¬Έμ— μš°λ¦¬λŠ” ν–‰λ ¬ $A$에 λŒ€ν•œ λ©±μ—°μ‚°(power operation)을 μ‰½κ²Œ ν•  수 μžˆλ‹€!!

이 같은 μ•„μ΄λ””μ–΄λ‘œ $A^{1/2}$λ₯Ό μœ λ„ν•΄λ³΄μž. κ°„λ‹¨ν•˜κ²Œ μΆ”λ‘ ν•˜λ©΄ μ•„λž˜μ™€ 같이 λ˜μ§€ μ•Šμ„κΉŒ?

\[A^{1/2} = UD^{1/2}U^T\]

정닡이닀! λ§ˆμ°¬κ°€μ§€ λ°©λ²•μœΌλ‘œ μŒμˆ˜μ— λŒ€ν•œ 멱연산도 μ‰½κ²Œ μ •μ˜ν•  수 μžˆλ‹€.

\[A^{-1} = UD^{-1}U^T\]

Convex Function

Positive definite matrix $A$λ₯Ό μ‚¬μš©ν•˜λ©΄, <convex function>2 ν•˜λ‚˜λ₯Ό μœ λ„ν•  수 μžˆλ‹€. λ¨Όμ € <convex function>의 μ •μ˜λΆ€ν„° μ‚΄νŽ΄λ³΄μž.

Definition.

A function $f: \mathbb{R}^n \rightarrow \mathbb{R}$ is called to be <convex> if

\[f(\lambda \mathbf{x} + (1-\lambda)\mathbf{y}) \le \lambda f(\mathbf{x}) + (1-\lambda) f(\mathbf{y})\]

for every $\mathbf{x}, \mathbf{y} \in \mathbb{R}^n$ and $0\le \lambda \le 1$.

식을 잘 μ‚΄νŽ΄λ³΄λ©΄, $\lambda \mathbf{x} + (1-\lambda)\mathbf{y}$λŠ” $\mathbf{x}$, $\mathbf{y}$ μ‚¬μ΄μ˜ 내뢄점이닀. λ˜ν•œ, $\lambda f(\mathbf{x}) + (1-\lambda) f(\mathbf{y})$λŠ” 두 점 $\mathbf{x}$, $\mathbf{y}$λ₯Ό μž‡λŠ” 직선 μœ„μ˜ ν•œ 점이닀.

즉, 뢀등식이 μ˜λ―Έν•˜λŠ” λ°”λŠ” λ‚΄λΆ„μ μ—μ„œμ˜ ν•¨μˆ˜κ°’μ€ 직선 μœ„μ˜ 값보닀 항상 μž‘λ‹€λŠ” 것을 μ˜λ―Έν•œλ‹€!


<convex>에 λŒ€ν•œ 정리λ₯Ό μ‚΄νŽ΄λ³΄μž.

Theorem.

Let $f: \mathbb{R}^n \rightarrow \mathbb{R}$ is twice continuously differentiable, if $f$ is convex if and only if

\[\frac{\partial^2 f}{\partial \mathbf{x} \partial \mathbf{x}^T} \succeq 0\]

즉, 두 번 λ―ΈλΆ„ν•œ 것이 항상 0보닀 크닀면, convex function이라고 μ •μ˜ν•  수 μžˆλ‹€λŠ” 말이닀! 2μ°¨μ›μ˜ $f(x) = ax^2 + bx + c$μ—μ„œλŠ” $f''(x) = a \ge 0$ μž„μ„ λ– μ˜¬λ¦¬λ©΄ μ’€ 와닿을 것이닀.


μ΄λ²ˆμ—λŠ” $A \succeq 0$ 인 행렬을 λ°”νƒ•μœΌλ‘œ μ–΄λ–€ convex function을 μœ λ„ν•΄λ³΄μž.

Example.

A quadratic function $f(\mathbf{x}) = \mathbf{x}^T A \mathbf{x} + \mathbf{a}^T \mathbf{x}$ is convex if and only if $A \succeq 0$.

μœ„μ™€ 같이 ν•¨μˆ˜ $f(x)$λ₯Ό μ •μ˜ν•˜λ©΄, 두 번 λ―ΈλΆ„ν–ˆμ„ λ•Œ,

\[\frac{\partial^2 f}{\partial \mathbf{x} \partial \mathbf{x}^T} = A \succeq 0\]

κ°€ 되기 λ•Œλ¬Έμ— convex function이 λœλ‹€. Quadratic formμ—μ„œ convex인 μ„±μ§ˆμ€ 정말 μ€‘μš”ν•œλ°, Quadratic form이 convexκ°€ λ˜μ–΄μ•Ό max/min을 λ…Όν•  수 있기 λ•Œλ¬Έμ΄λ‹€!


Orthogonal Projection

Definition.

For a subspace $\mathcal{L} \subseteq \mathbb{R}^n$, the <orthogonal complement> of $\mathcal{L}$ is defined as

\[\mathcal{L}^{\perp} = \{ \mathbf{x} \in \mathbb{R}^n : \mathbf{x}^T \mathbf{y} = 0 \quad \text{for all} \quad \mathbf{y} \in \mathcal{L} \}\]

ν‰μ†Œμ— μƒκ°ν•˜λ˜ orthogonal에 λŒ€ν•œ κ°œλ…μ„ vector space에 μ μš©ν•œ 것이 orthogonal complementλ‹€.

Theorem.

Each $\mathbf{x} \in \mathbb{R}^n$ can be uniquely represented as

\[\mathbf{x} = \mathbf{x}_{\mathcal{L}} + \mathbf{x}_{\mathcal{L^{\perp}}}\]

where \(\mathbf{x}_{\mathcal{L}} \in \mathcal{L}\) and \(\mathbf{x}_{\mathcal{L^{\perp}}} \in \mathcal{L}^{\perp}\).

μ—¬κΈ°μ„œ μš°λ¦¬λŠ” 벑터 $\mathbf{x}_{\mathcal{L}}$λ₯Ό $\mathbf{x}$의 $\mathcal{L}$둜의 <orthogonal projection>이라고 ν•œλ‹€.

그리고 이 <orthogonal projection>은 Linear mapping이닀. κ·Έλž˜μ„œ ν–‰λ ¬μ˜ ν˜•νƒœλ‘œ ν‘œν˜„ν•  수 μžˆλ‹€!!

The map $\mathbf{x} \mapsto \mathbf{x}_{\mathcal{L}}$ is a linear mapping.

Theorem.

\[\| \mathbf{x} - \mathbf{x}_{\mathcal{L}}\| \le \| \mathbf{x} - \mathbf{y} \| \quad \text{for every} \quad \mathbf{y} \in \mathcal{L}\]

μœ„ λΆ€λ“±μ‹μ˜ μ˜λ―ΈλŠ” $\mathcal{L}$ μœ„μ˜ 벑터와 $\mathbf{x}$ μ‚¬μ΄μ˜ 거리λ₯Ό 잴 λ•Œ, orthogonal proj. $\mathbf{x}_{\mathcal{L}}$이 κ°€μž₯ 짧은 거리λ₯Ό λ±‰μŒμ„ λ§ν•œλ‹€. 그림으둜 ν™•μΈν•˜λ©΄ μ•„λž˜μ™€ κ°™λ‹€.

Definition. idempotent or projection

$A \in \mathbb{R}^{n\times n}$ is called an <idempotent> or <projection> matrix if $A^2 = A$.

Theorem.

T.F.A.E.

(1) $A\mathbf{x}$ is the orthogonal projection of $\mathbf{x}$ onto $\mathcal{C}(A)$.

이 λͺ…μ œλŠ” $\mathbf{x}$에 $A$λ₯Ό κ³±ν•˜λŠ” μ—°μ‚°(맀핑) μžμ²΄κ°€ $\{A\mathbf{x} : \mathbf{x} \in \mathbb{R}^n \}$인 집합을 μœ λ„ν•˜λŠ”λ°, 이 집합이 λ°”λ‘œ $\mathcal{C}(A)$이닀.

(2) $A$ is a projection and $\mathcal{N}(A) \perp \mathcal{C}(A)$.

즉, for every $\mathbf{x} \in \mathcal{N}(A)$, $\mathbf{y} \in \mathcal{C}(A)$, $\mathbf{x}^T \mathbf{y} = 0$.

(3) $A$ is symmetric and idempotent.

κ·Έλž˜μ„œ λ§Œμ•½ μœ„μ˜ λͺ…μ œ 쀑 ν•˜λ‚˜λΌλ„ μ„±λ¦½ν•œλ‹€λ©΄, $A$λŠ” <orthogonal projection matrix> onto $\mathcal{C}(A)$κ°€ λœλ‹€.


Theorem.

Let $A \in \mathbb{R}^{n\times n}$ and symmetric. Then, T.F.A.E.

(1) $A^2 = A$

(2) All eigenvalues of $A$ are either 0 or 1.

(3) $\text{rank}(A) + \text{rank}(I_n - A) = n$

((1)$\implies$(2))λŠ” μ‰½κ²Œ <spectral decomposition>을 ν™œμš©ν•˜λ©΄, μ‰½κ²Œ 증λͺ…ν•  수 μžˆλ‹€.

Because $A$ is symmetric, $A = UDU^T$ by spectral theorem.

By statement (1), $A^2 = A$

\[A^2 = (UDU^T)(UDU^T) = UD^2U^T = UDU^T\]

λ”°λΌμ„œ, $D^2 = D$. 이것을 λ§Œμ‘±ν•˜λ €λ©΄, $d_i^2 = d_i$κ°€ λ˜μ–΄μ•Ό ν•œλ‹€. 이것은 $d_i = 0$ or $d_i = 1$일 λ•Œλ§Œ κ°€λŠ₯ν•˜λ‹€. $\blacksquare$

eigenvalue $d_i$κ°€ 0 or 1μ΄λΌλŠ” 사싀은 proj. $A$κ°€ $d_i = 1$인 νŠΉμ • $u_i$ λ²‘ν„°λ§Œ μ‚΄λ¦¬κ²Œ ν•˜λŠ” μ—°μ‚°μž„μ„ μ•Œ 수 있게 ν•΄μ€€λ‹€.

((2)$\implies$(3))도 증λͺ…ν•΄λ³΄μž. 이건 rankκ³Ό eigenvalue μ‚¬μ΄μ˜ 관계λ₯Ό 톡해 μ‰½κ²Œ 증λͺ…ν•  수 μžˆλ‹€.

rankλŠ” (# of non-zero eigenvalue)둜 μ •μ˜λœλ‹€. orthognoal proj인 $A$λŠ” eigvenvalueκ°€ 0 λ˜λŠ” 1μ΄λ―€λ‘œ $d_i = 1$의 갯수λ₯Ό μ„Έλ©΄ λœλ‹€.

$I_n - A$λ₯Ό μ‚΄νŽ΄λ³΄μž. 이건 $A$의 $d_i$의 값을 ν† κΈ€μ‹œμΌœμ€€λ‹€. λ”°λΌμ„œ, $I_n - A$의 rankλŠ” $A$의 것과 complementν•˜κ²Œ λœλ‹€. $\text{rank}(I_n - A) = n - \text{rank}(A)$. $\blacksquare$


λ“œλ””μ–΄ λ§ˆμ§€λ§‰ Theorem이닀. ν•˜μ§€λ§Œ, μ•„λž˜μ˜ λͺ…μ œλŠ” 이 <톡계적 λ°μ΄ν„°λ§ˆμ΄λ‹>μ΄λΌλŠ” κ³Όλͺ©μ—μ„œ <Regression>을 λ‹€λ£° λ•Œ 정말정말 많이 μ“°κ²Œ λ˜λŠ” μ •λ¦¬μ΄λ―€λ‘œ, 정말 μ€‘μš”ν•˜λ‹€! πŸ”₯

Theorem.

Let $X = (\mathbf{x}_1, \dots, \mathbf{x}_p)$ be an $n\times p$ matrix with $\text{rank}(X) = p$3 and

\[H = X(X^TX)^{-1}X^T\]

Then, $H$ is the orthogonal projection onto $C(X)$, that is

(1) $H^2 = H$

(2) $\mathcal(H) \perp \mathcal{N}(H)$

(3) $\mathcal{C}(H) = \mathcal{C}(X)$

μ΄λ•Œ, $X$λ‘œλΆ€ν„° μœ λ„ν•œ matrix $H$λ₯Ό <hat matrix>라고 ν•œλ‹€.


  1. 참고둜 μ˜λ²‘ν„° $\mathbf{0}$을 λΉΌλŠ” μ΄μœ λŠ” μ˜λ²‘ν„°λ₯Ό κ³±ν•˜λ©΄ $\mathbf{x}^T A \mathbf{x} = 0$이 되기 λ•Œλ¬Έμ΄λ‹€.Β 

  2. β€œconvexβ€λŠ” λ³Όλ‘ν•œ, β€œconcaveβ€λŠ” 였λͺ©ν•œμ„ μ˜λ―Έν•œλ‹€.Β 

  3. $X$의 λͺ¨λ“  Column $\mathbf{x}_i$κ°€ μ„œλ‘œ linearly independent ν•˜λ‹€λŠ” 말이닀.Β