Sequential Density Estimation
λ³Έ κΈμ 2020-2νκΈ° βμ»΄ν¨ν° λΉμ β μμ μ λ£κ³ , μ€μ€λ‘ νμ΅νλ©΄μ κ°μΈμ μΈ μ©λλ‘ μ 리ν κ²μ λλ€. μ§μ μ μΈμ λ νμμ λλ€ :)
Bayes Theorem
\[P(H \mid E) = \frac{P(E \mid H) P(H)}{P(E)}\]λ² μ΄μ¦ μ 리λ μ¬μ νλ₯ κ³Ό μ¬ν νλ₯ μ¬μ΄ κ΄κ³μ λν μ 리μ΄λ€. λ² μ΄μ¦ μ 리 λλ λ² μ΄μ¦ μ΄λ‘ μ ν΅μ¬μ μ¬κ±΄ $E$κ° λ°μνμ λ, νλ₯ $P(H)$λ₯Ό κ°±μ νλ κ²μ΄λ€. λ€λ₯΄κ² νννμλ©΄, λ°μ΄ν° $X$κ° κ΄μ°° λμμ λ, λΆν¬ $P(x)$λ₯Ό κ°±μ νλ κ²μ΄λ€.
μ©μ΄ μ 리
- $P(H)$: Prior probability
- νμ¬ κ°μ§κ³ μλ μ 보λ₯Ό κΈ°λ°μΌλ‘ μ ν νλ₯ (λλ λΆν¬)
- $P(H \mid E)$: Posteriori probability
- μ¬κ±΄ λ°μ ν, κ°±μ λ νλ₯ (λλ λΆν¬)
- μ¬μ μ 보 λλ λ°μ΄ν°λ₯Ό μΆκ°ν¨μΌλ‘μ¨ μμ λ νλ₯ μ΄λ€.
- $P(E \mid H)$: likelihood
- κ΄μΈ‘λ μ¬κ±΄ $E$κ° λ°μν νλ₯
Markov Assumption
<λ§λ₯΄μ½ν κ°μ ; Markov Assumption>μ 볡μ‘ν νλ₯ μ κ³Όμ μ λ¨μνκ² λ§λ€μ΄ μ€λ€.
So, $x_{t+1}$ is independent to other past states $x_{1:t-1}$
Density Estimation
λ°λ μΆμ ; Density Estimationμ κ΄μΈ‘λ λ°μ΄ν°λ€μ λΆν¬ $z_{1:t}$λ₯Ό λ°νμΌλ‘ λ³μ $x_t$μ νλ₯ λΆν¬μ νΉμ±μ μΆμ νλ μμ μ΄λ€. $p(x_t \mid z_{1:t})$
μ΄λ λ³μ $x_t$μ λ°λ(density)λ₯Ό μΆμ νλ κ²μ 곧, $x_t$μ νλ₯ λ°λν¨μ(pdf; probability density function)μ μΆμ νλ κ²μ΄λ€.
Hidden Markov Model
μλ λ§λ₯΄μ½ν λͺ¨λΈ; Hidden Markov Modelμ Sequential dataλ₯Ό λ€λ£¨λλ° μ¬μ©νλ λͺ¨λΈμ΄λ€.
HMMμ λ§λ₯΄μ½ν κ°μ μ λ°λ₯Έλ€. λ°λΌμ
\[\begin{aligned} p(x_t \mid x_{1:t-1}) &= p(x_t \mid x_{t_1})\\ p(z_t \mid x_{1:t}) &= p(z_t \mid x_t) \end{aligned}\]μ μ±μ§μ λ§μ‘±νλ€.
μμ λ μ±μ§μ Sequential dataμμ Density Estimation νλ λ°μ μ€μν λκ΅¬λ‘ μ¬μ©λλ€!
Sequential Density Estimation
Sequential Density Estimationμ Sequential dataμ λν pdfλ₯Ό μΆμ νλ κ³Όμ μ΄λ€.
μ¦, pdf $p(x_t \mid z_{1:t})$λ₯Ό μΆμ νλ€.
κ³Όμ μ΄ μ’ λ³΅μ‘νλ°, μ²μ²ν μ΄ν΄λ³΄μ.
λ¨Όμ μ‘°κ±΄λΆ νλ₯ μΈ $p(x_t \mid z_{1:t})$λ₯Ό λ² μ΄μ¦ μ 리λ₯Ό μ΄μ©ν΄ λΆλ¦¬νλ€.
\[\begin{aligned} p(x_t \mid z_{1:t}) = \frac{p(x_t, z_{1:t})}{p(z_{1:t})} \end{aligned}\]μ΄λ, λΆλͺ¨μ $p(z_{1:t})$λ observationμ λν νλ₯ μ΄λ©°, μμλΌκ³ κ°μ νλ€.
λ°λΌμ, $p(x_t \mid z_{1:t})$λ μλμ κ°λ€.
\[\begin{aligned} p(x_t \mid z_{1:t}) \propto p(x_t, z_{1:t}) \end{aligned}\]μ΄μ μ°λ¦¬μ λͺ©νλ $p(x_t \mid z_{1:t})$λ₯Ό ꡬνλ κ²μμ $p(x_t, z_{1:t})$λ₯Ό ꡬνλ κ²μ΄ λμλ€.
\[\begin{aligned} p(x_t, z_{1:t}) = \int \cdots \int \int p(x_{1:t}, z_{1:t}) \; dx_1 dx_2 \cdots dx_{t-1} \end{aligned}\]μμ μμ $p(x_t, z_{1:t})$λ₯Ό ꡬνλ λ°©λ²μ λν κΈ°μ μ΄λ€. μμ μ’μμ μ°λ‘ μ΄ν΄νλ κ²λ³΄λ€λ μ°μμ μ’λ‘ μ΄ν΄ν΄μΌ νλ€.
\[\begin{aligned} \int \cdots \int \int p(x_{1:t}, z_{1:t}) \; dx_1 dx_2 \cdots dx_{t-1} \end{aligned}\]λ₯Ό κ°μ₯ μμͺ½μ μλ μ λΆμλΆν° μ΄ν΄ν΄λ³΄μ.
\[\begin{aligned} \int p(\{ x_1, \cdots, x_{t_1}, x_t \}, z_{1:t}) \; dx_1 \end{aligned}\]μ΄κ²μ $p(\{ x_1, \cdots, x_{t_1}, x_t \}, z_{1:t})$μμ $x_1$λ₯Ό μΆμΆνλ(marginalize out) κ³Όμ μ΄λ€.
μ΄λ, $x_1$λ $\{ x_2, \cdots, x_{t_1}, x_t \}$μ $\{ z_1, \cdots, z_t \}$μ λͺ¨λ indenpendentνλ―λ‘
- $x_1 \perp x_i \quad (2 \le i \le t)$
- $x_1 \perp z_j \quad (1 \le j \le t)$
κ° λλ€.
μ΄ κ³Όμ μ $x_2$λΆν° $x_{t-1}$κΉμ§ λ°λ³΅νλ©΄, μλμ κ°μ κ²°κ³Όλ₯Ό μ»μ μ μλ€.
\[\begin{aligned} p(x_t, z_{1:t}) = \int \cdots \int \int p(x_{1:t}, z_{1:t}) \; dx_1 dx_2 \cdots dx_{t-1} \end{aligned}\]μ΄λ²μλ μ€μ²©λ μ λΆμμμ μ λΆλλ ν¨μμΈ $p(x_{1:t}, z_{1:t})$λ₯Ό μ΄ν΄λ³΄μ.
$p(x_{1:t}, z_{1:t})$λ λ² μ΄μ¦ μ 리μ μν΄ μλμ κ°μ΄ ννλλ€.
\[\begin{aligned} p(x_{1:t}, z_{1:t}) = p(z_{1:t} \mid x_{1:t}) \cdot p(x_{1:t}) \end{aligned}\]λ°λΌμ μ€μ²©λ μ λΆμμ λ€μ μ°λ©΄,
\[\begin{aligned} & \int \cdots \int \int p(x_{1:t}, z_{1:t}) \; dx_1 dx_2 \cdots dx_{t-1} \\ =& \int \cdots \int \int p(z_{1:t} \mid x_{1:t}) \cdot p(x_{1:t}) \; dx_1 dx_2 \cdots dx_{t-1} \end{aligned}\]κ° λλ€.
μ΄μ ν¨μ $p(z_{1:t} \mid x_{1:t}) \cdot p(x_{1:t})$λ₯Ό μ£Όλͺ©ν΄λ³΄μ.
λ¨Όμ μμ μμμ $p(x_{1:t})$ λΆλΆμ probabilistic dependencyμ μν΄ μλμ κ°μ΄ ννν μ μλ€.
\[p(x_{1:t}) = p(x_t \mid x_{t-1}) \cdots p(x_2 \mid x_1) p(x_1)\]$p(z_{1:t} \mid x_{1:t})$ λΆλΆ μμ probabilistic dependencyμ μν΄ μλμ κ°μ΄ ννλλ€.
\[p(z_{1:t} \mid x_{1:t}) = p(z_t \mid x_t) \cdots p(z_2 \mid x_2) p(z_1 \mid x_1)\]μ’ ν©νλ©΄, μλμ κ°λ€.
\[\begin{aligned} p(z_{1:t} \mid x_{1:t}) & \cdot p(x_{1:t}) \\ = \left( p(x_t \mid x_{t-1}) \cdots p(x_2 \mid x_1) p(x_1) \right) & \cdot \left( p(z_t \mid x_t) \cdots p(z_2 \mid x_2) p(z_1 \mid x_1) \right) \end{aligned}\]μ΄κ²μ μ λΆμμ λ°μνλ©΄ μλμ κ°λ€.
\[\begin{aligned} & \int \cdots \int \int p(z_{1:t} \mid x_{1:t}) \cdot p(x_{1:t}) \; dx_1 dx_2 \cdots dx_{t-1} \\ = & \int \cdots \int \int p(z_{1:t} \mid x_{1:t}) \cdot \left( p(x_t \mid x_{t-1}) \cdots p(x_2 \mid x_1) p(x_1) \right) \cdot \left( p(z_t \mid x_t) \cdots p(z_2 \mid x_2) p(z_1 \mid x_1) \right) \cdots dx_{t-1} \end{aligned}\]μμ΄ μμ£Όμμ£Ό 볡μ‘ν΄μ‘λ€ γ γ
νμ§λ§, μμ μ€μ²©λ μ λΆμμ μμΈλ‘ μ’μ μ±μ§μ κ°μ§κ³ μλ€!!
μμ μ λΆμμμ μ λΆμ κ΄μ¬νμ§ μλ λ³μλ₯Ό λͺ¨λ λΆλ¦¬ν΄λ³΄μ. κ·Έλ¬λ©΄ μλμ κ°μ μμ μ»λλ€.
\[\begin{aligned} & \int \cdots \int \int p(z_{1:t} \mid x_{1:t}) \cdot \left( p(x_t \mid x_{t-1}) \cdots p(x_2 \mid x_1) p(x_1) \right) \cdot \left( p(z_t \mid x_t) \cdots p(z_2 \mid x_2) p(z_1 \mid x_1) \right) \cdots dx_{t-1} \\ =& \; p(z_{1:t} \mid x_{1:t}) \left( \int_{x_{t-1}} p(x_t \mid x_{t-1}) p(z_{t-1} \mid x_{t-1}) \cdots \left( \int_{x_2} p(x_3 \mid x_2)p(z_2 \mid x_2) \left( \int_{x_1} p(x_2 \mid x_1)p(z_1 \mid x_1) p(x_1) dx_1 \right) dx_2 \right) \cdots dx_{t-1} \right) \end{aligned}\]μμ΄ λ κΈΈμ΄μ‘λ€ γ γ
μ€μ λ μ λΆμ νκΈ° μν΄ κ°μ₯ μμͺ½μμλΆν° μμν΄λ³΄μ.
\[\int_{x_1} p(x_2 \mid x_1) p(z_1 \mid x_1) p(x_1) \; dx_1\]λ² μ΄μ¦ μ 리μ μν΄ $p(z_1 \mid x_1) p(x_1) = \dfrac{p(z_1, x_1)}{p(x_1)} p(x_1) = p(z_1, x_1)$μ΄λ€.
μ΄λ, $p(z_1, x_1) = \dfrac{p(x_1 \mid z_1)}{p(z_1)}$μΈλ°, $p(z_1)$μ΄ μμμ΄λ―λ‘ $p(z_1, x_1) \propto p(x_1 \mid z_1)$μ΄λ€.
μ 리νλ©΄,
\[p(z_1 \mid x_1) p(x_1) = p(z_1, x_1) \propto p(x_1 \mid z_1)\]λ°λΌμ
\[\begin{aligned} & \int_{x_1} p(x_2 \mid x_1) p(z_1 \mid x_1) p(x_1) \; dx_1 \\ \propto& \int_{x_1} p(x_2 \mid x_1) p(x_1 \mid z_1) \; dx_1 \end{aligned}\]μ΄λ, $p(x_2 \mid x_1) p(x_1 \mid z_1)$λ₯Ό μ μ 리νμ¬ $p(x_2, x_1 \mid z_1)$λ₯Ό μ λν μ μλ€.
\[\begin{aligned} p(x_2, x_1 \mid z_1) &= \frac{p(x_2 \cap x_1 \cap z_1)}{p(z_1)} \\ &= \frac{p(x_2 \mid (x_1 \cap z_1)) \cdot p(x_1 \cap z_1)}{p(z_1)} \\ &= p(x_2 \mid x_1) \cdot \frac{p(x_1 \cap z_1)}{p(z_1)} \\ &= p(x_2 \mid x_1) \cdot p(x_1 \mid z_1) \end{aligned}\]λ°λΌμ μ λΆμμ μλμ κ°μ΄ λ°κΏ μ μλ€.
\[\begin{aligned} & \int_{x_1} p(x_2 \mid x_1) p(x_1 \mid z_1) \; dx_1 \\ =& \int_{x_1} p(x_2, x_1 \mid z_1) \; dx_1 \end{aligned}\]μ λΆμμ κ³μ°νλ©΄, $x_1$μ΄ marginalize outλλ€.
\[\begin{aligned} & \int_{x_1} p(x_2, x_1 \mid z_1) \; dx_1 \\ =& \, p(x_2 \mid z_1) \end{aligned}\]κ°μ₯ μμͺ½ μ λΆμμ μ»μ κ²°κ³Όλ₯Ό νμ©ν΄ κ·Έ λ€μ μ λΆμ ν΄κ²°ν΄λ³΄μ.
\[\int_{x_2} p(x_3 \mid x_2) p(z_2 \mid x_2) p(x_2 \mid z_1) \; dx_2\]μ΄λ²μλ $p(z_2 \mid x_2) \cdot p(x_2 \mid z_1)$λ₯Ό λ³ννλ€.
\[\begin{aligned} & p(z_2 \mid x_2) \cdot p(x_2 \mid z_1) \\ =& \; p(z_2 \mid x_2) \cdot \frac{p(x_2 \cap z_1)}{p(z_1)} \\ =& \; \frac{p(z_2 \mid (x_2 \cap z_1)) \cdot p(x_2 \cap z_1)}{p(z_1)} \\ =& \; \frac{p(z_2 \cap x_2 \cap z_1)}{p(z_1)}\\ =& \; \frac{p(x_2 \cap (z_1 \cap z_2))}{p(z_1 \cap z_2)}\\ =& \; p(x_2 \mid z_1, z_2) \end{aligned}\]λ°λΌμ λλ²μ§Έ μ λΆμμ μλμ κ°λ€.
\[\begin{aligned} & \int_{x_2} p(x_3 \mid x_2) p(z_2 \mid x_2) p(x_2 \mid z_1) \; dx_2 \\ =& \int_{x_2} p(x_3 \mid x_2) p(x_2 \mid z_1, z_2) \; dx_2 \\ =& \; p(x_3 \mid z_1, z_2) \end{aligned}\]μμ κ°μ λ°©μμΌλ‘ λ΄λΆμ μ λΆκ°μ μ΄μ©ν΄ μ λΆμ κ³μ μ§ννλ©΄, μ°λ¦¬κ° μ»κ³ μ νλ $p(x_t \mid z_{1:t})$λ₯Ό μ»μ μ μλ€.
$p(x_t \mid z_{1:t})$λ₯Ό μΆμ νλ Sequential Density Estimationμ λ΄λΆμμ μ»μ κ°μ νμ©ν΄ λ€μ μ λΆμ κ°μ ꡬνλ κ³Όμ μΌλ‘ μ§νλκΈ° λλ¬Έμ, μΌμ’ μ μ¬κ·(recursion)μ΄λΌκ³ λ³Ό μ μλ€.