2021-1ν•™κΈ°, λŒ€ν•™μ—μ„œ β€˜ν†΅κ³„μ  λ°μ΄ν„°λ§ˆμ΄λ‹β€™ μˆ˜μ—…μ„ λ“£κ³  κ³΅λΆ€ν•œ λ°”λ₯Ό μ •λ¦¬ν•œ κΈ€μž…λ‹ˆλ‹€. 지적은 μ–Έμ œλ‚˜ ν™˜μ˜μž…λ‹ˆλ‹€ :)

13 minute read

2021-1ν•™κΈ°, λŒ€ν•™μ—μ„œ β€˜ν†΅κ³„μ  λ°μ΄ν„°λ§ˆμ΄λ‹β€™ μˆ˜μ—…μ„ λ“£κ³  κ³΅λΆ€ν•œ λ°”λ₯Ό μ •λ¦¬ν•œ κΈ€μž…λ‹ˆλ‹€. 지적은 μ–Έμ œλ‚˜ ν™˜μ˜μž…λ‹ˆλ‹€ :)

(Review) Matrix

✨ (μ€‘μš”) 별닀λ₯Έ 언급이 μ—†λ‹€λ©΄, λͺ¨λ“  벑터 β€œcolumn vectorβ€œλ‹€. ✨

A $3 \times 1$ vector $\mathbf{x}$ is

\[\mathbf{x} = \begin{pmatrix} 1 \\ 2 \\ 3 \end{pmatrix}\]

<ν–‰λ ¬; matrix>λŠ” β€œvector”λ₯Ό μ›μ†Œλ‘œ κ°–λŠ” β€œ(column) vector”닀.

A $4 \times 3$ matrix $\mathbf{X}$ is

\[\mathbf{X} = \begin{pmatrix} \mathbf{x}_1^T \\ \mathbf{x}_2^T \\ \mathbf{x}_3^T \\ \mathbf{x}_4^T \end{pmatrix} = \begin{pmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \\ 10 & 11 & 12 \end{pmatrix}\]
  • $X_i$: $X$의 $i$번째 μ›μ†Œ = $\mathbf{x}_i$
  • $X_{ij}$: $X$의 $i$번째 μ›μ†Œ $\mathbf{x}_i$의 $j$번째 μ›μ†Œ

행렬을 μ—΄λ²‘ν„°λ‘œ ν•΄μ„ν•˜λŠ” μ‹œμ•ΌπŸ‘€λ₯Ό μ²΄λ“ν•˜λŠ”κ²Œ μ€‘μš”ν•˜λ‹€!

Matrix Multiplication

ν–‰λ ¬ κ³±μ…ˆμ˜ μ‹œμž‘μ€ μ’Œν‘œ λ³€ν™˜μ΄λ‹€.

\[\begin{pmatrix} c_1 & c_2 & c_3 & c_4 \end{pmatrix} \begin{pmatrix} x \\ y \\ z \\ t \end{pmatrix} = c_1 x + c_2 y + c_3 z + c_4 t\]

μœ„μ˜ μ˜ˆμ‹œλŠ” κ°„λ‹¨ν•œ κ±°κ³ , 쑰금 λ³΅μž‘ν•˜κ²Œ κ°€λ©΄,

\[X \begin{pmatrix} x \\ y \\ z \\ t \end{pmatrix} = \begin{pmatrix} \mathbf{x}_1^T \\ \mathbf{x}_2^T \\ \mathbf{x}_3^T \\ \mathbf{x}_4^T \end{pmatrix} \begin{pmatrix} x \\ y \\ z \\ t \end{pmatrix} = \mathbf{x}_1^T \cdot \begin{pmatrix} x \\ y \\ z \\ t \end{pmatrix} + \mathbf{x}_2^T \cdot \begin{pmatrix} x \\ y \\ z \\ t \end{pmatrix} + \mathbf{x}_3^T \cdot \begin{pmatrix} x \\ y \\ z \\ t \end{pmatrix} + \mathbf{x}_4^T \cdot \begin{pmatrix} x \\ y \\ z \\ t \end{pmatrix}\]

μ’Œν‘œ λ³€ν™˜, λ˜λŠ” μ„ ν˜• λ³€ν™˜μ˜ λ„κ΅¬λ‘œμ˜ ν–‰λ ¬ κ³±μ…ˆμ€

For $A \in \mathbb{R}^{n\times p}$, $\mathbf{x} \in \mathbb{R}^{p\times 1}$

\[A \mathbf{x} \in \mathbb{R}^{n\times 1}\]

즉, $p$차원 벑터λ₯Ό 차원 $n$으둜 λ§€ν•‘ν•˜λŠ” 것이닀. 이λ₯Ό μœ„ν•΄μ„œλŠ” $p$차원 벑터가 $n$κ°œκ°€ ν•„μš”ν•˜κ³ , 이것이 λ°”λ‘œ ν–‰λ ¬ $A$닀…!


Definition. Design Matrix πŸ”₯

μ •κ·œ μˆ˜μ—…μ˜ 첫번째 κ°•μ˜μ—μ„œλŠ” <linear regression>을 μˆ˜ν•™μ μœΌλ‘œ μ •μ˜ν•˜λ©΄μ„œ <design matrix> $X$λ₯Ό μ œμ‹œν•œλ‹€. 이 <design matrix>λŠ” $p$-dim input features vector $n$개λ₯Ό λͺ¨μ€ $n \times p$ μ°¨μ›μ˜ 행렬이닀.

\[X = \begin{pmatrix} x_1^T \\ x_2^T \\ \vdots \\ x_n^T \end{pmatrix} \in \mathbb{R}^{n\times p}\]

<linear regression>에선 coefficients 벑터인 $\beta$λ₯Ό κ³±ν•΄ $X \beta \in \mathbb{R}^{n\times 1}$을 κ΅¬ν•œλ‹€. 그러면 $n\times 1$의 열벑터가 λ‚˜μ˜€λŠ”λ°, λ§Œμ•½ $n \ge 1,000$라면, $X\beta$κ°€ λ„ˆλ¬΄ 큰 μ°¨μ›μ˜ 벑터가 μ•„λ‹Œκ°€, λ„ˆμ–΄μ–΄λ¬΄ 큰 μ„ ν˜•λ³€ν™˜μ΄ μ•„λ‹Œκ°€ μƒκ°ν–ˆλ˜ 적이 μžˆλ‹€. 그런데 그런 μ˜λ¬Έμ€ μ•„λž˜μ˜ 식을 보자 ν•΄κ²°λ˜μ—ˆλ‹€.

\[Y = X\beta + \epsilon\]

즉, $X\beta$μ—μ„œ $X$λŠ” μ„ ν˜•λ³€ν™˜μœΌλ‘œ μ‚¬μš©λœ 것이 μ•„λ‹ˆλ‹€. 였히렀 <regression>을 κ·Έ 자체λ₯Ό μˆ˜ν–‰ν•˜κΈ° μœ„ν•΄ κ³ μ•ˆλœ β€œν–‰λ ¬β€μ΄λΌκ³  λ°›μ•„λ“€μ΄λŠ” 것이 μ’‹λ‹€! 😁


Linear Algebra


Definition. Vector Space

A <vector space> is a set, consisting of vectors, equipped with scalar multiplication and vector addition.


Definition. Linear Function

For two vector spaces $V$ and $W$, a function $f: V \longrightarrow W$ is called to be <linear> if

\[f(c\mathbf{x} + \mathbf{y}) = cf(\mathbf{x}) + f(\mathbf{y}) \quad \text{for all} \; \mathbf{x}, \mathbf{y} \in V \; \text{and} \; c \in \mathbb{R}\]


Theorem.

Every <linear function between Euclidean spaces> can be represented as a matrix multiplication.

\[f(\mathbf{x}) = A \mathbf{x}\]


Note.

$n \times p$ 크기의 ν–‰λ ¬ $A = (a_{ij})$λŠ” <linear function from $\mathbb{R}^p$ to $\mathbb{R}^n$>으둜 해석할 수 μžˆλ‹€!!

\[f_A (\mathbf{x}) = A\mathbf{x}\]

보톡 $n \times p$ 크기의 ν–‰λ ¬μ—μ„œ

  • $n$ = sample size
  • $p$ = # of input variables

둜 ν•΄μ„ν•œλ‹€!


Definition. Linear Subspace

$\mathcal{L} \subset \mathbb{R}^n$ is called a <linear subspace> of $\mathbb{R}^n$ if

\[\text{For} \quad \mathbf{x}, \mathbf{y} \in \mathcal{L} \quad \text{and} \quad c \in \mathbb{R} \implies c \mathbf{x} + \mathbf{y} \in \mathcal{L}\]


Definition. span

$\mathcal{L}$ is <spanned> by ${\mathbf{x}_1, \dots, \mathbf{x}_k }$ if

\[\mathcal{L} = \left\{ c_1 \mathbf{x}_1 + \cdots + c_k \mathbf{x}_k : c_j \in \mathbb{R} \right\}\]

μ•žμœΌλ‘œ 이런 spanned subspaceλ₯Ό μ•„λž˜μ™€ 같이 ν‘œκΈ°ν•œλ‹€.

\[\left[ \mathbf{x}_1, \dots, \mathbf{x}_k \right]\]


Definition. Linearly Independent

$\left\{\mathbf{x}_1, \dots, \mathbf{x}_k \right\}$ is called to be <linearly independent> if

\[c_1 \mathbf{x}_1 + \cdots + c_k \mathbf{x}_k = 0 \iff c_1 = \cdots = c_k = 0\]

즉, Linear Comb.κ°€ 0이 되렀면, Coeffi $c_i$κ°€ λͺ¨λ‘ 0이 λ˜μ–΄μ•Ό ν•œλ‹€!


Theorem. T.F.A.E.

  1. $\left\{\mathbf{x}_1, \dots, \mathbf{x}_k \right\}$ is linearly independent.
  2. Any $\mathbf{x}_i$ cannot be represented as a linear combination of $\left\{ \mathbf{x}_j : j \ne i \right\}$.
    • 즉, \(\displaystyle \mathbf{x}_i = \sum_{j \ne i} c_j \mathbf{x}_j\)κ°€ λΆˆκ°€λŠ₯ν•˜λ‹€λŠ” 말이닀.


Definition. basis & dimension

$\left\{\mathbf{x}_1, \dots, \mathbf{x}_k \right\}$ is called a <basis> of $\mathcal{L}$ if

  1. $\mathcal{L} = \left[ \mathbf{x}_1, \dots, \mathbf{x}_k \right]$
  2. $\left\{\mathbf{x}_1, \dots, \mathbf{x}_k \right\}$ is linearly inependent.

Also, we call $k$ as a <dimension> of $\mathcal{L}$, $\dim(\mathcal{L})$.


Theorem.

If $\left\{\mathbf{x}_1, \dots, \mathbf{x}_k \right\}$ is a basis of $\mathcal{L}$ and $\mathbf{x} \in \mathcal{L}$, there exist unique $c_1, \dots, c_k \in \mathbb{R}$ s.t.

\[\mathbf{y} = c_1 \mathbf{x}_1 + \cdots + c_k \mathbf{x}_k\]

즉, <basis>에 μ˜ν•œ representation은 uniquenessλ₯Ό 보μž₯ν•œλ‹€!


Definition orthogonal & orthonormal

A basis $\left\{\mathbf{x}_1, \dots, \mathbf{x}_k \right\}$ of $\mathcal{L}$ is called to be <orthogonal> if $\mathbf{x}_i^T \mathbf{x}_j = 0$ for $i \ne j$.

If the norm of each basis $\mathbf{x}_i$ is one, then we call them as <orthonormal>.


Theorem.

If $\left\{\mathbf{x}_1, \dots, \mathbf{x}_k \right\}$ is an orthonormal basis of $\mathcal{L}$ and $\mathbf{y} \in \mathcal{L}$, then

\[\mathbf{y} = \left( \mathbf{x}_1^T \mathbf{y} \right) \mathbf{x}_1 + \cdots + \left( \mathbf{x}_k^T \mathbf{y} \right) \mathbf{x}_k\]

즉, λ§Œμ•½ <orthonormal basis>라면, vector $\mathbf{y}$ λ₯Ό ν‘œν˜„ν•˜λŠ” λͺ¨λ“  coefficient $c_i$λ₯Ό uniquelyν•˜κ²Œ νŠΉμ •ν•  수 μžˆλ‹€λŠ” 말이닀!


Rank Theory

Definition. Column space & Row space & Null space

\[A \in \mathbb{R}^{n \times p}\] \[A\mathbf{x} = \begin{bmatrix} \mid & \cdots & \mid \\ \mathbf{a}_1 & \cdots & \mathbf{a}_p \\ \mid & \cdots & \mid \end{bmatrix} \begin{bmatrix} x_1 \\ \vdots \\ x_p \end{bmatrix} = x_1 \mathbf{a}_1 + \cdots x_p \mathbf{a}_p\]

1. <Column space> $\mathcal{C}(A)$λŠ” $A$의 Column으둜 spanν•œ linear spaceλ‹€.

= $\left[ \mathbf{a}_1, \dots, \mathbf{a}_p \right]$

= $\left\{ A\mathbf{x} : \mathbf{x} \in \mathbb{R}^p \right\}$

2. <Row space> $\mathcal{R}(A)$λŠ” $A$의 Row둜 spanν•œ linear spaceλ‹€.

NOTE: $\mathcal{R}(A) = \mathcal{C}(A^T)$

3. <Null space> $\mathcal{N}(A)$λŠ” $A\mathbf{x} = \mathbf{0}$인 λͺ¨λ“  $\mathbf{x}$둜 spanν•œ linear spaceλ‹€.

= $\left\{ \mathbf{x} \in \mathbb{R}^p: A \mathbf{x} = \mathbf{0} \right\}$

NOTE: λ§Œμ•½ $A$κ°€ invertible이라면, $A\mathbf{x} = \mathbf{0}$을 λ§Œμ‘±ν•˜λŠ” $\mathbf{x}$λŠ” $\mathbf{0}$ 뿐이기 λ•Œλ¬Έμ—, $\mathcal{N}(A) = \mathbf{0}$이 λœλ‹€.

Definition. Rank

The <rank> of $A$ is the dimension of $\mathcal{C}(A)$.

Theorem. Fundamental Theorem of Liniear Algebra

Part 1: (Rank-Nullity Theorem; Rank Theorem)

The column and row spaces of an $m \times n$ matrix $A$ both have dimension $r$, the <rank> of the matrix.

  • the nullspace has dimension $n-r$
  • the left nullspace has dimension $m-r$

Part 2:

The <Null space> and <Row space> are <orthogonal>.

The <Left Mull space> and the <Column space> are also <orthogonal>.

Part 3:

Any matrix $M$ can be written in the form

\[U \Sigma V^T\]

where

  • $U$ is an $m\times m$ unitary matrix.
  • $\Sigma$ is an $m\times n$ matrix with non-negative values on the diagonal.
  • $V$ is an $n\times n$ unitary matrix.

Theorem. Rank Theorem

For any $A \in \mathbb{R}^{n \times p}$,

\[\dim (\mathcal{C}(A)) = \dim (\mathcal{R}(A))\]

κ·Έλž˜μ„œ <Rank Theorem>에 μ˜ν•΄ ν–‰λ ¬ $A$의 <rank>λ₯Ό $\dim (\mathcal{C}(A))$둜 μ •μ˜ν•˜λ“ , $\dim (\mathcal{R}(A))$둜 μ •μ˜ν•˜λ“  상관이 μ—†λ‹€!

증λͺ…은 이곳을 μ°Έμ‘°ν•  것 πŸ‘‰ proof



Properties. Rank

\[A, B \in \mathbb{R}^{n\times p}\]

1. $\text{rank}(A) \le \text{rank}(n, p)$

2. $\text{rank}(A) = \text{rank}(A^T)$ (by <Rank Theorem>)

3. $\text{rank}(A) + \dim(\mathcal{N}(A)) = p$

4. If $A \in \mathbb{R}^{n\times n}$, T.F.A.E.

  • $A$ is invertible
  • $\text{rank}(A) = n$
  • $\mathcal{N}(A) = 0$
  • $\det (A) \ne 0$

5. $\text{rank}(AB) = \text{rank}(A)$ if $B$ is invertible

6. $\text{rank}(AB) = \text{rank}(B)$ if $A$ is invertible

7. $\text{rank}(AB) \le \min \{ \text{rank}(A), \text{rank}(B)\}$

8. $\text{rank}(AA^T) = \text{rank}(A^TA) = \text{rank}(A)$



Definition. Determinant

The <determinant> of a matrix $A \in \mathbb{R}^{n \times n}$ is

\[\sum_{\pi} \text{sgn} (\pi) \cdot \left( a_{1\pi(1)} \cdots a_{n\pi(n)}\right)\]

where the summation is taken over all permutations of $\{ 1, \dots, n\}$.

<Determinant>에 λŒ€ν•œ 정말 Abstract ν•œ μ •μ˜λ‹€. $3\times3$ 에 λŒ€ν•΄μ„  μ•„λž˜μ™€ 같이 μ‰½κ²Œ ꡬ할 수 μžˆλ‹€.


Properties. Determinant

μ΄μ–΄μ§€λŠ” μ„±μ§ˆλ“€μ€ <Determinant>에 λŒ€ν•œ μ„±μ§ˆμ„ 정말 Abstract ν•œ μš©μ–΄λ‘œ κΈ°μˆ ν•œ 것이닀.

\[A, B \in \mathbb{R}^{n\times n}\]
  • Multilinear (= linear for each column & row)
\[\det (\mathbf{a}_1, \dots, c\mathbf{a}_i + \mathbf{d}, \dots, \mathbf{a}_n) = c \det (\mathbf{a}_1, \dots, \mathbf{a}_i, \dots, \mathbf{a}_n) + \det (\mathbf{a}_1, \dots, \mathbf{d}, \dots, \mathbf{a}_n)\]
  • Alternating
\[\det (\mathbf{a}_1, \dots, \mathbf{a}_i, \dots, \mathbf{a}_j, \dots, \mathbf{a}_n) = -\det (\mathbf{a}_1, \dots, \mathbf{a}_j, \dots, \mathbf{a}_i, \dots, \mathbf{a}_n)\]
  • Normalized
\[\det (\mathbf{I}_n) = 1\]

μ΄μ–΄μ§€λŠ” μ„±μ§ˆλ“€μ€ <Dterminant>μ—μ„œ μ•„μ£Ό 자주 μ‚¬μš©ν•˜λŠ” μ„±μ§ˆλ“€μ΄λ‹€!

  • $\det(c A) = c^n \det (A)$
  • $\det (A) = \det (A^T)$
  • $\det(AB) = \det(A)\det(B)$
  • $\det(A^{-1}) = 1/\det(A)$
  • If $A$ is triangular, then $\displaystyle \det A = \prod^n_{i=1} a_{ii}$
  • For eigenvalues $\lambda_i$ of $A$, $\displaystyle \det (A) = \prod^n_{i=1} \lambda_i$

μ΄μ–΄μ§€λŠ” λ‚΄μš©μ—μ„  <Eigen Value>와 <Eigen Vector>, <Matrix Derivative>, <Spectral Decomposition> λ“± 더 μ§€λž„ λ§žμ€ λ‚΄μš©λ“€μ΄ νŠ€μ–΄λ‚˜μ˜¨λ‹€ 🀩

πŸ‘‰ Supp-1: Linear Algebra - 2