Supp-1: Linear Algebra - 4
2021-1νκΈ°, λνμμ βν΅κ³μ λ°μ΄ν°λ§μ΄λβ μμ μ λ£κ³ 곡λΆν λ°λ₯Ό μ 리ν κΈμ λλ€. μ§μ μ μΈμ λ νμμ λλ€ :)
Set-up
μ΄λ€ νλ ¬ $A$μ λν΄ κ·Έ νλ ¬μ μ κ³±κ·Ό νλ ¬μ μ°Ύμ μ μμκΉ? κ·Έλ¬λκΉ μ΄λ€ νλ ¬ $B$κ° μμ΄ $B B = A$κ° λλ κ·Έλ° νλ ¬μ μ°Ύμ μ μλμ§μ λν μ§λ¬Έμ΄λ€.
μ°μ νλ ¬μ΄ μ κ³±κ·Όμ κ°μ§λ μ’μ μ±μ§μ κ°μ§λ €λ©΄, νλ ¬ $A$κ° symmetric matrixκ° λμ΄μΌ ν κ²μ΄λ€; $A \in \mathbb{R}^{n\times n}$.
μ€μ μμμμλ μ΄λ€ μ $x$κ° μ κ³±κ·Όμ κ°μ§λ €λ©΄, κ·Έ μκ° $x \ge 0$ μ¬μΌ νλ€. (볡μ μ κ³±κ·Όλ μμ§λ§, μ¬κΈ°μλ μ€μ μμ μμ μ μλ μ κ³±κ·Ό μ°μ°μ μκ°νμ.) μ¦, μ κ³±κ·Όμ μ μνκΈ° μν΄μ non-negativeβμ λν κ°λ μ νλ ¬μ μμμΌλ‘ λμ΄μ¬λ €μΌ νλ€. κ·Έλ° μ μμ μ΄λ²μ μ΄ν΄λ³Ό <Nonnegative Definite>λ βnon-negativeβλ₯Ό νμ₯ν κ²μ΄λΌκ³ μκ°νλ©΄ μ’λ€.
Nonnegative Definite Matrices
Theorem.
For a symmetric matrix $A \in \mathbb{R}^{n \times n}$, T.F.A.E.
(1) $A$ is <non-negative definite>, denoted $A \succeq 0$:
\[\mathbf{x}^T A \mathbf{x} \ge 0 \quad \text{for every} \quad \mathbf{x} \in \mathbb{R}^{n}\](2) All eigenvalues of $A$ are non-negative.
(3) $A = B^T B$ for some $B$.
(4) $A$ is variance-covariance matrix of some randome variable.
<Nonnegative Definite>μ μλ―Έλ₯Ό μ’λ μ΄ν΄λ³΄μ. $A$κ° symmetric matrixμ΄λ―λ‘ <spectral decomposition>μ΄ κ°λ₯νλ€; $A = UDU^T$
\[\begin{aligned} A = UDU^T = d_1 \mathbf{u}_1 \mathbf{u}_1^T + \cdots + d_n \mathbf{u}_n \mathbf{u}_n^T \end{aligned}\]μ¬κΈ°μ μ’μ°μ λ²‘ν° $\mathbf{x}$λ₯Ό κ³±ν΄μ£Όμ.
\[\begin{aligned} \mathbf{x}^T A \mathbf{x} &= \mathbf{x}^T \left( d_1 \mathbf{u}_1 \mathbf{u}_1^T + \cdots + d_n \mathbf{u}_n \mathbf{u}_n^T \right) \mathbf{x} \\ &= d_1 (\mathbf{u}_1^T \mathbf{x})^2 + \cdots + d_n (\mathbf{u}_n^T \mathbf{x})^2 \quad (\because \mathbf{x}^T \mathbf{u}_i {\mathbf{u}_i}^T \mathbf{x} = \mathbf{x}^T \mathbf{u}_i \cdot {\mathbf{u}_i}^T \mathbf{x} ) \end{aligned}\]λ§μ½ $\mathbf{x}^T A \mathbf{x} \ge 0$μ΄ λκΈ° μν΄μ μ΄λ€ μ‘°κ±΄μ΄ νμν κΉ? μ°μ μμ μ°λ³μμ $(\mathbf{u}_1^T \mathbf{x})^2 \ge 0$μ΄ μλ€λ κ²μ μ£Όλͺ©νμ. λ°λΌμ, λ§μ½ λͺ¨λ $d_i$κ° non-negativeλΌλ©΄, μ°λ³μ λΉμ°ν non-negativeκ° λ κ²μ΄λ€! μ΄κ²μΌλ‘ ($\impliedby$) λ°©ν₯μ μ¦λͺ νλ€.
($\implies$) λ°©ν₯μ μ¦λͺ νλ €λ©΄, $\mathbf{x} = \mathbf{u}_i$λ‘ μ€μ ν΄λ³΄λ©΄ λλ€.
\[{\mathbf{u}_i}^T A \mathbf{u}_i = d_i ({\mathbf{u}_i}^T \mathbf{u}_i)^2 \ge 0\]μμ λΆλ±μμ λ§μ‘±νκΈ° μν΄μλ $d_i \ge 0$μ΄ λμ΄μΌ νλ€.
$\blacksquare$
Positive Definite Matrices
Theorem.
For a symmetric matrix $A \in \mathbb{R}^{n \times n}$, T.F.A.E.
(1) $A$ is <positive definite>, denoted $A \succ 0$:
\[\mathbf{x}^T A \mathbf{x} > 0 \quad \text{for every} \quad \mathbf{x} \in \mathbb{R}^{n} \setminus \{\mathbf{0}\}\](2) All eigenvalues of $A$ are positive.
(3) $A = B^T B$ for some non-singular $B$.
(4) $A$ is non-negative definite and non-singular.
(5) There exist linearly independent vectors $\mathbf{u}_1, \cdots, \mathbf{u}_n \in \mathbb{R}^n$ s.t. \(\displaystyle A = \sum^n_{j=1} {\mathbf{u}_j}^T \mathbf{u}_j\)
(4)λ² λͺ μ λ₯Ό μ’λ μ΄ν΄λ³΄μ. νλ ¬ $A$κ° SDP(symmetric and positive definite)λΌλ©΄, μ°λ¦¬κ° μ΄λ»κ² $A^{1/2}$λ₯Ό μ°Ύμ μ μμκΉ? λ΅μ μμ <Spectral Decomposition>μ μλ€!!
<Spectral Decomposition>μ μν΄ νλ ¬ $A$λ μλμ κ°μ΄ λΆν΄λλ€.
\[A = UDU^T\]λ§μ½ μ΄λ $A^2$μ ꡬνλ€λ©΄,
\[A^2 = A \cdot A = (UDU^T) (UDU^T) = UD^2 D^T\]μ¦, orthogonal matrix $U$λΌλ μ’μ λ μμ΄ μκΈ° λλ¬Έμ μ°λ¦¬λ νλ ¬ $A$μ λν λ©±μ°μ°(power operation)μ μ½κ² ν μ μλ€!!
μ΄ κ°μ μμ΄λμ΄λ‘ $A^{1/2}$λ₯Ό μ λν΄λ³΄μ. κ°λ¨νκ² μΆλ‘ νλ©΄ μλμ κ°μ΄ λμ§ μμκΉ?
\[A^{1/2} = UD^{1/2}U^T\]μ λ΅μ΄λ€! λ§μ°¬κ°μ§ λ°©λ²μΌλ‘ μμμ λν λ©±μ°μ°λ μ½κ² μ μν μ μλ€.
\[A^{-1} = UD^{-1}U^T\]Convex Function
Positive definite matrix $A$λ₯Ό μ¬μ©νλ©΄, <convex function>2 νλλ₯Ό μ λν μ μλ€. λ¨Όμ <convex function>μ μ μλΆν° μ΄ν΄λ³΄μ.
Definition.
A function $f: \mathbb{R}^n \rightarrow \mathbb{R}$ is called to be <convex> if
\[f(\lambda \mathbf{x} + (1-\lambda)\mathbf{y}) \le \lambda f(\mathbf{x}) + (1-\lambda) f(\mathbf{y})\]for every $\mathbf{x}, \mathbf{y} \in \mathbb{R}^n$ and $0\le \lambda \le 1$.
μμ μ μ΄ν΄λ³΄λ©΄, $\lambda \mathbf{x} + (1-\lambda)\mathbf{y}$λ $\mathbf{x}$, $\mathbf{y}$ μ¬μ΄μ λ΄λΆμ μ΄λ€. λν, $\lambda f(\mathbf{x}) + (1-\lambda) f(\mathbf{y})$λ λ μ $\mathbf{x}$, $\mathbf{y}$λ₯Ό μλ μ§μ μμ ν μ μ΄λ€.
μ¦, λΆλ±μμ΄ μλ―Ένλ λ°λ λ΄λΆμ μμμ ν¨μκ°μ μ§μ μμ κ°λ³΄λ€ νμ μλ€λ κ²μ μλ―Ένλ€!
<convex>μ λν μ 리λ₯Ό μ΄ν΄λ³΄μ.
Theorem.
Let $f: \mathbb{R}^n \rightarrow \mathbb{R}$ is twice continuously differentiable, if $f$ is convex if and only if
\[\frac{\partial^2 f}{\partial \mathbf{x} \partial \mathbf{x}^T} \succeq 0\]μ¦, λ λ² λ―ΈλΆν κ²μ΄ νμ 0λ³΄λ€ ν¬λ€λ©΄, convex functionμ΄λΌκ³ μ μν μ μλ€λ λ§μ΄λ€! 2μ°¨μμ $f(x) = ax^2 + bx + c$μμλ $f''(x) = a \ge 0$ μμ λ μ¬λ¦¬λ©΄ μ’ μλΏμ κ²μ΄λ€.
μ΄λ²μλ $A \succeq 0$ μΈ νλ ¬μ λ°νμΌλ‘ μ΄λ€ convex functionμ μ λν΄λ³΄μ.
Example.
A quadratic function $f(\mathbf{x}) = \mathbf{x}^T A \mathbf{x} + \mathbf{a}^T \mathbf{x}$ is convex if and only if $A \succeq 0$.
μμ κ°μ΄ ν¨μ $f(x)$λ₯Ό μ μνλ©΄, λ λ² λ―ΈλΆνμ λ,
\[\frac{\partial^2 f}{\partial \mathbf{x} \partial \mathbf{x}^T} = A \succeq 0\]κ° λκΈ° λλ¬Έμ convex functionμ΄ λλ€. Quadratic formμμ convexμΈ μ±μ§μ μ λ§ μ€μνλ°, Quadratic formμ΄ convexκ° λμ΄μΌ max/minμ λ Όν μ μκΈ° λλ¬Έμ΄λ€!
Orthogonal Projection
Definition.
For a subspace $\mathcal{L} \subseteq \mathbb{R}^n$, the <orthogonal complement> of $\mathcal{L}$ is defined as
\[\mathcal{L}^{\perp} = \{ \mathbf{x} \in \mathbb{R}^n : \mathbf{x}^T \mathbf{y} = 0 \quad \text{for all} \quad \mathbf{y} \in \mathcal{L} \}\]νμμ μκ°νλ orthogonalμ λν κ°λ μ vector spaceμ μ μ©ν κ²μ΄ orthogonal complementλ€.
Theorem.
Each $\mathbf{x} \in \mathbb{R}^n$ can be uniquely represented as
\[\mathbf{x} = \mathbf{x}_{\mathcal{L}} + \mathbf{x}_{\mathcal{L^{\perp}}}\]where \(\mathbf{x}_{\mathcal{L}} \in \mathcal{L}\) and \(\mathbf{x}_{\mathcal{L^{\perp}}} \in \mathcal{L}^{\perp}\).
μ¬κΈ°μ μ°λ¦¬λ λ²‘ν° $\mathbf{x}_{\mathcal{L}}$λ₯Ό $\mathbf{x}$μ $\mathcal{L}$λ‘μ <orthogonal projection>μ΄λΌκ³ νλ€.
κ·Έλ¦¬κ³ μ΄ <orthogonal projection>μ Linear mappingμ΄λ€. κ·Έλμ νλ ¬μ ννλ‘ ννν μ μλ€!!
The map $\mathbf{x} \mapsto \mathbf{x}_{\mathcal{L}}$ is a linear mapping.
Theorem.
μ λΆλ±μμ μλ―Έλ $\mathcal{L}$ μμ 벑ν°μ $\mathbf{x}$ μ¬μ΄μ 거리λ₯Ό μ΄ λ, orthogonal proj. $\mathbf{x}_{\mathcal{L}}$μ΄ κ°μ₯ 짧μ 거리λ₯Ό λ±μμ λ§νλ€. κ·Έλ¦ΌμΌλ‘ νμΈνλ©΄ μλμ κ°λ€.
Definition. idempotent or projection
$A \in \mathbb{R}^{n\times n}$ is called an <idempotent> or <projection> matrix if $A^2 = A$.
Theorem.
T.F.A.E.
(1) $A\mathbf{x}$ is the orthogonal projection of $\mathbf{x}$ onto $\mathcal{C}(A)$.
μ΄ λͺ μ λ $\mathbf{x}$μ $A$λ₯Ό κ³±νλ μ°μ°(맀ν) μμ²΄κ° $\{A\mathbf{x} : \mathbf{x} \in \mathbb{R}^n \}$μΈ μ§ν©μ μ λνλλ°, μ΄ μ§ν©μ΄ λ°λ‘ $\mathcal{C}(A)$μ΄λ€.
(2) $A$ is a projection and $\mathcal{N}(A) \perp \mathcal{C}(A)$.
μ¦, for every $\mathbf{x} \in \mathcal{N}(A)$, $\mathbf{y} \in \mathcal{C}(A)$, $\mathbf{x}^T \mathbf{y} = 0$.
(3) $A$ is symmetric and idempotent.
κ·Έλμ λ§μ½ μμ λͺ μ μ€ νλλΌλ μ±λ¦½νλ€λ©΄, $A$λ <orthogonal projection matrix> onto $\mathcal{C}(A)$κ° λλ€.
Theorem.
Let $A \in \mathbb{R}^{n\times n}$ and symmetric. Then, T.F.A.E.
(1) $A^2 = A$
(2) All eigenvalues of $A$ are either 0 or 1.
(3) $\text{rank}(A) + \text{rank}(I_n - A) = n$
((1)$\implies$(2))λ μ½κ² <spectral decomposition>μ νμ©νλ©΄, μ½κ² μ¦λͺ ν μ μλ€.
Because $A$ is symmetric, $A = UDU^T$ by spectral theorem.
By statement (1), $A^2 = A$
\[A^2 = (UDU^T)(UDU^T) = UD^2U^T = UDU^T\]λ°λΌμ, $D^2 = D$. μ΄κ²μ λ§μ‘±νλ €λ©΄, $d_i^2 = d_i$κ° λμ΄μΌ νλ€. μ΄κ²μ $d_i = 0$ or $d_i = 1$μΌ λλ§ κ°λ₯νλ€. $\blacksquare$
eigenvalue $d_i$κ° 0 or 1μ΄λΌλ μ¬μ€μ proj. $A$κ° $d_i = 1$μΈ νΉμ $u_i$ 벑ν°λ§ μ΄λ¦¬κ² νλ μ°μ°μμ μ μ μκ² ν΄μ€λ€.
((2)$\implies$(3))λ μ¦λͺ ν΄λ³΄μ. μ΄κ±΄ rankκ³Ό eigenvalue μ¬μ΄μ κ΄κ³λ₯Ό ν΅ν΄ μ½κ² μ¦λͺ ν μ μλ€.
rankλ (# of non-zero eigenvalue)λ‘ μ μλλ€. orthognoal projμΈ $A$λ eigvenvalueκ° 0 λλ 1μ΄λ―λ‘ $d_i = 1$μ κ°―μλ₯Ό μΈλ©΄ λλ€.
$I_n - A$λ₯Ό μ΄ν΄λ³΄μ. μ΄κ±΄ $A$μ $d_i$μ κ°μ ν κΈμμΌμ€λ€. λ°λΌμ, $I_n - A$μ rankλ $A$μ κ²κ³Ό complementνκ² λλ€. $\text{rank}(I_n - A) = n - \text{rank}(A)$. $\blacksquare$
λλμ΄ λ§μ§λ§ Theoremμ΄λ€. νμ§λ§, μλμ λͺ μ λ μ΄ <ν΅κ³μ λ°μ΄ν°λ§μ΄λ>μ΄λΌλ κ³Όλͺ©μμ <Regression>μ λ€λ£° λ μ λ§μ λ§ λ§μ΄ μ°κ² λλ μ 리μ΄λ―λ‘, μ λ§ μ€μνλ€! π₯
Theorem.
Let $X = (\mathbf{x}_1, \dots, \mathbf{x}_p)$ be an $n\times p$ matrix with $\text{rank}(X) = p$3 and
\[H = X(X^TX)^{-1}X^T\]Then, $H$ is the orthogonal projection onto $C(X)$, that is
(1) $H^2 = H$
(2) $\mathcal(H) \perp \mathcal{N}(H)$
(3) $\mathcal{C}(H) = \mathcal{C}(X)$
μ΄λ, $X$λ‘λΆν° μ λν matrix $H$λ₯Ό <hat matrix>λΌκ³ νλ€.