Supp-1: Linear Algebra - 1
2021-1νκΈ°, λνμμ βν΅κ³μ λ°μ΄ν°λ§μ΄λβ μμ μ λ£κ³ 곡λΆν λ°λ₯Ό μ 리ν κΈμ λλ€. μ§μ μ μΈμ λ νμμ λλ€ :)
(Review) Matrix
β¨ (μ€μ) λ³λ€λ₯Έ μΈκΈμ΄ μλ€λ©΄, λͺ¨λ λ²‘ν° βcolumn vectorβλ€. β¨
A $3 \times 1$ vector $\mathbf{x}$ is
\[\mathbf{x} = \begin{pmatrix} 1 \\ 2 \\ 3 \end{pmatrix}\]<νλ ¬; matrix>λ βvectorβλ₯Ό μμλ‘ κ°λ β(column) vectorβλ€.
A $4 \times 3$ matrix $\mathbf{X}$ is
\[\mathbf{X} = \begin{pmatrix} \mathbf{x}_1^T \\ \mathbf{x}_2^T \\ \mathbf{x}_3^T \\ \mathbf{x}_4^T \end{pmatrix} = \begin{pmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \\ 10 & 11 & 12 \end{pmatrix}\]- $X_i$: $X$μ $i$λ²μ§Έ μμ = $\mathbf{x}_i$
- $X_{ij}$: $X$μ $i$λ²μ§Έ μμ $\mathbf{x}_i$μ $j$λ²μ§Έ μμ
νλ ¬μ μ΄λ²‘ν°λ‘ ν΄μνλ μμΌπλ₯Ό 체λνλκ² μ€μνλ€!
Matrix Multiplication
νλ ¬ κ³±μ μ μμμ μ’ν λ³νμ΄λ€.
\[\begin{pmatrix} c_1 & c_2 & c_3 & c_4 \end{pmatrix} \begin{pmatrix} x \\ y \\ z \\ t \end{pmatrix} = c_1 x + c_2 y + c_3 z + c_4 t\]μμ μμλ κ°λ¨ν κ±°κ³ , μ‘°κΈ λ³΅μ‘νκ² κ°λ©΄,
\[X \begin{pmatrix} x \\ y \\ z \\ t \end{pmatrix} = \begin{pmatrix} \mathbf{x}_1^T \\ \mathbf{x}_2^T \\ \mathbf{x}_3^T \\ \mathbf{x}_4^T \end{pmatrix} \begin{pmatrix} x \\ y \\ z \\ t \end{pmatrix} = \mathbf{x}_1^T \cdot \begin{pmatrix} x \\ y \\ z \\ t \end{pmatrix} + \mathbf{x}_2^T \cdot \begin{pmatrix} x \\ y \\ z \\ t \end{pmatrix} + \mathbf{x}_3^T \cdot \begin{pmatrix} x \\ y \\ z \\ t \end{pmatrix} + \mathbf{x}_4^T \cdot \begin{pmatrix} x \\ y \\ z \\ t \end{pmatrix}\]μ’ν λ³ν, λλ μ ν λ³νμ λꡬλ‘μ νλ ¬ κ³±μ μ
For $A \in \mathbb{R}^{n\times p}$, $\mathbf{x} \in \mathbb{R}^{p\times 1}$
\[A \mathbf{x} \in \mathbb{R}^{n\times 1}\]μ¦, $p$μ°¨μ 벑ν°λ₯Ό μ°¨μ $n$μΌλ‘ 맀ννλ κ²μ΄λ€. μ΄λ₯Ό μν΄μλ $p$μ°¨μ 벑ν°κ° $n$κ°κ° νμνκ³ , μ΄κ²μ΄ λ°λ‘ νλ ¬ $A$λ€β¦!
Definition. Design Matrix π₯
μ κ· μμ μ 첫λ²μ§Έ κ°μμμλ <linear regression>μ μνμ μΌλ‘ μ μνλ©΄μ <design matrix> $X$λ₯Ό μ μνλ€. μ΄ <design matrix>λ $p$-dim input features vector $n$κ°λ₯Ό λͺ¨μ $n \times p$ μ°¨μμ νλ ¬μ΄λ€.
\[X = \begin{pmatrix} x_1^T \\ x_2^T \\ \vdots \\ x_n^T \end{pmatrix} \in \mathbb{R}^{n\times p}\]<linear regression>μμ coefficients 벑ν°μΈ $\beta$λ₯Ό κ³±ν΄ $X \beta \in \mathbb{R}^{n\times 1}$μ ꡬνλ€. κ·Έλ¬λ©΄ $n\times 1$μ μ΄λ²‘ν°κ° λμ€λλ°, λ§μ½ $n \ge 1,000$λΌλ©΄, $X\beta$κ° λ무 ν° μ°¨μμ 벑ν°κ° μλκ°, λμ΄μ΄λ¬΄ ν° μ νλ³νμ΄ μλκ° μκ°νλ μ μ΄ μλ€. κ·Έλ°λ° κ·Έλ° μλ¬Έμ μλμ μμ 보μ ν΄κ²°λμλ€.
\[Y = X\beta + \epsilon\]μ¦, $X\beta$μμ $X$λ μ νλ³νμΌλ‘ μ¬μ©λ κ²μ΄ μλλ€. μ€νλ € <regression>μ κ·Έ μ체λ₯Ό μννκΈ° μν΄ κ³ μλ βνλ ¬βμ΄λΌκ³ λ°μλ€μ΄λ κ²μ΄ μ’λ€! π
Linear Algebra
Definition. Vector Space
A <vector space> is a set, consisting of vectors, equipped with scalar multiplication and vector addition.
Definition. Linear Function
For two vector spaces $V$ and $W$, a function $f: V \longrightarrow W$ is called to be <linear> if
\[f(c\mathbf{x} + \mathbf{y}) = cf(\mathbf{x}) + f(\mathbf{y}) \quad \text{for all} \; \mathbf{x}, \mathbf{y} \in V \; \text{and} \; c \in \mathbb{R}\]
Theorem.
Every <linear function between Euclidean spaces> can be represented as a matrix multiplication.
\[f(\mathbf{x}) = A \mathbf{x}\]
Note.
$n \times p$ ν¬κΈ°μ νλ ¬ $A = (a_{ij})$λ <linear function from $\mathbb{R}^p$ to $\mathbb{R}^n$>μΌλ‘ ν΄μν μ μλ€!!
\[f_A (\mathbf{x}) = A\mathbf{x}\]λ³΄ν΅ $n \times p$ ν¬κΈ°μ νλ ¬μμ
- $n$ = sample size
- $p$ = # of input variables
λ‘ ν΄μνλ€!
Definition. Linear Subspace
$\mathcal{L} \subset \mathbb{R}^n$ is called a <linear subspace> of $\mathbb{R}^n$ if
\[\text{For} \quad \mathbf{x}, \mathbf{y} \in \mathcal{L} \quad \text{and} \quad c \in \mathbb{R} \implies c \mathbf{x} + \mathbf{y} \in \mathcal{L}\]
Definition. span
$\mathcal{L}$ is <spanned> by ${\mathbf{x}_1, \dots, \mathbf{x}_k }$ if
\[\mathcal{L} = \left\{ c_1 \mathbf{x}_1 + \cdots + c_k \mathbf{x}_k : c_j \in \mathbb{R} \right\}\]μμΌλ‘ μ΄λ° spanned subspaceλ₯Ό μλμ κ°μ΄ νκΈ°νλ€.
\[\left[ \mathbf{x}_1, \dots, \mathbf{x}_k \right]\]
Definition. Linearly Independent
$\left\{\mathbf{x}_1, \dots, \mathbf{x}_k \right\}$ is called to be <linearly independent> if
\[c_1 \mathbf{x}_1 + \cdots + c_k \mathbf{x}_k = 0 \iff c_1 = \cdots = c_k = 0\]μ¦, Linear Comb.κ° 0μ΄ λλ €λ©΄, Coeffi $c_i$κ° λͺ¨λ 0μ΄ λμ΄μΌ νλ€!
Theorem. T.F.A.E.
- $\left\{\mathbf{x}_1, \dots, \mathbf{x}_k \right\}$ is linearly independent.
- Any $\mathbf{x}_i$ cannot be represented as a linear combination of $\left\{ \mathbf{x}_j : j \ne i \right\}$.
- μ¦, \(\displaystyle \mathbf{x}_i = \sum_{j \ne i} c_j \mathbf{x}_j\)κ° λΆκ°λ₯νλ€λ λ§μ΄λ€.
Definition. basis & dimension
$\left\{\mathbf{x}_1, \dots, \mathbf{x}_k \right\}$ is called a <basis> of $\mathcal{L}$ if
- $\mathcal{L} = \left[ \mathbf{x}_1, \dots, \mathbf{x}_k \right]$
- $\left\{\mathbf{x}_1, \dots, \mathbf{x}_k \right\}$ is linearly inependent.
Also, we call $k$ as a <dimension> of $\mathcal{L}$, $\dim(\mathcal{L})$.
Theorem.
If $\left\{\mathbf{x}_1, \dots, \mathbf{x}_k \right\}$ is a basis of $\mathcal{L}$ and $\mathbf{x} \in \mathcal{L}$, there exist unique $c_1, \dots, c_k \in \mathbb{R}$ s.t.
\[\mathbf{y} = c_1 \mathbf{x}_1 + \cdots + c_k \mathbf{x}_k\]μ¦, <basis>μ μν representationμ uniquenessλ₯Ό 보μ₯νλ€!
Definition orthogonal & orthonormal
A basis $\left\{\mathbf{x}_1, \dots, \mathbf{x}_k \right\}$ of $\mathcal{L}$ is called to be <orthogonal> if $\mathbf{x}_i^T \mathbf{x}_j = 0$ for $i \ne j$.
If the norm of each basis $\mathbf{x}_i$ is one, then we call them as <orthonormal>.
Theorem.
If $\left\{\mathbf{x}_1, \dots, \mathbf{x}_k \right\}$ is an orthonormal basis of $\mathcal{L}$ and $\mathbf{y} \in \mathcal{L}$, then
\[\mathbf{y} = \left( \mathbf{x}_1^T \mathbf{y} \right) \mathbf{x}_1 + \cdots + \left( \mathbf{x}_k^T \mathbf{y} \right) \mathbf{x}_k\]μ¦, λ§μ½ <orthonormal basis>λΌλ©΄, vector $\mathbf{y}$ λ₯Ό νννλ λͺ¨λ coefficient $c_i$λ₯Ό uniquelyνκ² νΉμ ν μ μλ€λ λ§μ΄λ€!
Rank Theory
Definition. Column space & Row space & Null space
1. <Column space> $\mathcal{C}(A)$λ $A$μ ColumnμΌλ‘ spanν linear spaceλ€.
= $\left[ \mathbf{a}_1, \dots, \mathbf{a}_p \right]$
= $\left\{ A\mathbf{x} : \mathbf{x} \in \mathbb{R}^p \right\}$
2. <Row space> $\mathcal{R}(A)$λ $A$μ Rowλ‘ spanν linear spaceλ€.
NOTE: $\mathcal{R}(A) = \mathcal{C}(A^T)$
3. <Null space> $\mathcal{N}(A)$λ $A\mathbf{x} = \mathbf{0}$μΈ λͺ¨λ $\mathbf{x}$λ‘ spanν linear spaceλ€.
= $\left\{ \mathbf{x} \in \mathbb{R}^p: A \mathbf{x} = \mathbf{0} \right\}$
NOTE: λ§μ½ $A$κ° invertibleμ΄λΌλ©΄, $A\mathbf{x} = \mathbf{0}$μ λ§μ‘±νλ $\mathbf{x}$λ $\mathbf{0}$ λΏμ΄κΈ° λλ¬Έμ, $\mathcal{N}(A) = \mathbf{0}$μ΄ λλ€.
Definition. Rank
The <rank> of $A$ is the dimension of $\mathcal{C}(A)$.
Theorem. Fundamental Theorem of Liniear Algebra
Part 1: (Rank-Nullity Theorem; Rank Theorem)
The column and row spaces of an $m \times n$ matrix $A$ both have dimension $r$, the <rank> of the matrix.
- the nullspace has dimension $n-r$
- the left nullspace has dimension $m-r$
Part 2:
The <Null space> and <Row space> are <orthogonal>.
The <Left Mull space> and the <Column space> are also <orthogonal>.
Part 3:
Any matrix $M$ can be written in the form
\[U \Sigma V^T\]where
- $U$ is an $m\times m$ unitary matrix.
- $\Sigma$ is an $m\times n$ matrix with non-negative values on the diagonal.
- $V$ is an $n\times n$ unitary matrix.
Theorem. Rank Theorem
For any $A \in \mathbb{R}^{n \times p}$,
\[\dim (\mathcal{C}(A)) = \dim (\mathcal{R}(A))\]κ·Έλμ <Rank Theorem>μ μν΄ νλ ¬ $A$μ <rank>λ₯Ό $\dim (\mathcal{C}(A))$λ‘ μ μνλ , $\dim (\mathcal{R}(A))$λ‘ μ μνλ μκ΄μ΄ μλ€!
μ¦λͺ μ μ΄κ³³μ μ°Έμ‘°ν κ² π proof
Properties. Rank
1. $\text{rank}(A) \le \text{rank}(n, p)$
2. $\text{rank}(A) = \text{rank}(A^T)$ (by <Rank Theorem>)
3. $\text{rank}(A) + \dim(\mathcal{N}(A)) = p$
4. If $A \in \mathbb{R}^{n\times n}$, T.F.A.E.
- $A$ is invertible
- $\text{rank}(A) = n$
- $\mathcal{N}(A) = 0$
- $\det (A) \ne 0$
5. $\text{rank}(AB) = \text{rank}(A)$ if $B$ is invertible
6. $\text{rank}(AB) = \text{rank}(B)$ if $A$ is invertible
7. $\text{rank}(AB) \le \min \{ \text{rank}(A), \text{rank}(B)\}$
8. $\text{rank}(AA^T) = \text{rank}(A^TA) = \text{rank}(A)$
Definition. Determinant
The <determinant> of a matrix $A \in \mathbb{R}^{n \times n}$ is
\[\sum_{\pi} \text{sgn} (\pi) \cdot \left( a_{1\pi(1)} \cdots a_{n\pi(n)}\right)\]where the summation is taken over all permutations of $\{ 1, \dots, n\}$.
<Determinant>μ λν μ λ§ Abstract ν μ μλ€. $3\times3$ μ λν΄μ μλμ κ°μ΄ μ½κ² ꡬν μ μλ€.
Properties. Determinant
μ΄μ΄μ§λ μ±μ§λ€μ <Determinant>μ λν μ±μ§μ μ λ§ Abstract ν μ©μ΄λ‘ κΈ°μ ν κ²μ΄λ€.
\[A, B \in \mathbb{R}^{n\times n}\]- Multilinear (= linear for each column & row)
- Alternating
- Normalized
μ΄μ΄μ§λ μ±μ§λ€μ <Dterminant>μμ μμ£Ό μμ£Ό μ¬μ©νλ μ±μ§λ€μ΄λ€!
- $\det(c A) = c^n \det (A)$
- $\det (A) = \det (A^T)$
- $\det(AB) = \det(A)\det(B)$
- $\det(A^{-1}) = 1/\det(A)$
- If $A$ is triangular, then $\displaystyle \det A = \prod^n_{i=1} a_{ii}$
- For eigenvalues $\lambda_i$ of $A$, $\displaystyle \det (A) = \prod^n_{i=1} \lambda_i$
μ΄μ΄μ§λ λ΄μ©μμ <Eigen Value>μ <Eigen Vector>, <Matrix Derivative>, <Spectral Decomposition> λ± λ μ§λ λ§μ λ΄μ©λ€μ΄ νμ΄λμ¨λ€ π€©