λ³Έ 글은 2020-2ν•™κΈ° β€œμ»΄ν“¨ν„° 비전” μˆ˜μ—…μ„ λ“£κ³ , 슀슀둜 ν•™μŠ΅ν•˜λ©΄μ„œ 개인적인 μš©λ„λ‘œ μ •λ¦¬ν•œ κ²ƒμž…λ‹ˆλ‹€. 지적은 μ–Έμ œλ‚˜ ν™˜μ˜μž…λ‹ˆλ‹€ :)

11 minute read

λ³Έ 글은 2020-2ν•™κΈ° β€œμ»΄ν“¨ν„° 비전” μˆ˜μ—…μ„ λ“£κ³ , 슀슀둜 ν•™μŠ΅ν•˜λ©΄μ„œ 개인적인 μš©λ„λ‘œ μ •λ¦¬ν•œ κ²ƒμž…λ‹ˆλ‹€. 지적은 μ–Έμ œλ‚˜ ν™˜μ˜μž…λ‹ˆλ‹€ :)


Bayes Theorem

\[P(H \mid E) = \frac{P(E \mid H) P(H)}{P(E)}\]

베이즈 μ •λ¦¬λŠ” 사전 ν™•λ₯ κ³Ό 사후 ν™•λ₯  사이 관계에 λŒ€ν•œ 정리이닀. 베이즈 정리 λ˜λŠ” 베이즈 이둠의 핡심은 사건 $E$κ°€ λ°œμƒν–ˆμ„ λ•Œ, ν™•λ₯  $P(H)$λ₯Ό κ°±μ‹ ν•˜λŠ” 것이닀. λ‹€λ₯΄κ²Œ ν‘œν˜„ν•˜μžλ©΄, 데이터 $X$κ°€ κ΄€μ°° λ˜μ—ˆμ„ λ•Œ, 뢄포 $P(x)$λ₯Ό κ°±μ‹ ν•˜λŠ” 것이닀.

μš©μ–΄ 정리

  • $P(H)$: Prior probability
    • ν˜„μž¬ 가지고 μžˆλŠ” 정보λ₯Ό 기반으둜 μ •ν•œ ν™•λ₯  (λ˜λŠ” 뢄포)
  • $P(H \mid E)$: Posteriori probability
    • 사건 λ°œμƒ ν›„, κ°±μ‹ λœ ν™•λ₯  (λ˜λŠ” 뢄포)
    • 사전 정보 λ˜λŠ” 데이터λ₯Ό μΆ”κ°€ν•¨μœΌλ‘œμ¨ μˆ˜μ •λœ ν™•λ₯ μ΄λ‹€.
  • $P(E \mid H)$: likelihood
    • κ΄€μΈ‘λœ 사건 $E$κ°€ λ°œμƒν•  ν™•λ₯ 



Markov Assumption

<마λ₯΄μ½”ν”„ κ°€μ •; Markov Assumption>은 λ³΅μž‘ν•œ ν™•λ₯ μ˜ 과정을 λ‹¨μˆœν•˜κ²Œ λ§Œλ“€μ–΄ μ€€λ‹€.

The current state $x_{t+1}$ only depends on the previous state $x_{t}$.
So, $x_{t+1}$ is independent to other past states $x_{1:t-1}$



Density Estimation

밀도 μΆ”μ •; Density Estimation은 κ΄€μΈ‘λœ λ°μ΄ν„°λ“€μ˜ 뢄포 $z_{1:t}$λ₯Ό λ°”νƒ•μœΌλ‘œ λ³€μˆ˜ $x_t$의 ν™•λ₯  λΆ„ν¬μ˜ νŠΉμ„±μ„ μΆ”μ •ν•˜λŠ” μž‘μ—…μ΄λ‹€. $p(x_t \mid z_{1:t})$

μ΄λ•Œ λ³€μˆ˜ $x_t$의 밀도(density)λ₯Ό μΆ”μ •ν•˜λŠ” 것은 곧, $x_t$의 ν™•λ₯ λ°€λ„ν•¨μˆ˜(pdf; probability density function)을 μΆ”μ •ν•˜λŠ” 것이닀.



Hidden Markov Model

은닉 마λ₯΄μ½”ν”„ λͺ¨λΈ; Hidden Markov Model은 Sequential dataλ₯Ό λ‹€λ£¨λŠ”λ° μ‚¬μš©ν•˜λŠ” λͺ¨λΈμ΄λ‹€.

HMM은 마λ₯΄μ½”ν”„ 가정을 λ”°λ₯Έλ‹€. λ”°λΌμ„œ

\[\begin{aligned} p(x_t \mid x_{1:t-1}) &= p(x_t \mid x_{t_1})\\ p(z_t \mid x_{1:t}) &= p(z_t \mid x_t) \end{aligned}\]

의 μ„±μ§ˆμ„ λ§Œμ‘±ν•œλ‹€.

μœ„μ˜ 두 μ„±μ§ˆμ€ Sequential dataμ—μ„œ Density Estimation ν•˜λŠ” 데에 μ€‘μš”ν•œ λ„κ΅¬λ‘œ μ‚¬μš©λœλ‹€!



Sequential Density Estimation

Sequential Density Estimation은 Sequential data에 λŒ€ν•œ pdfλ₯Ό μΆ”μ •ν•˜λŠ” 과정이닀.

즉, pdf $p(x_t \mid z_{1:t})$λ₯Ό μΆ”μ •ν•œλ‹€.

과정이 μ’€ λ³΅μž‘ν•œλ°, 천천히 μ‚΄νŽ΄λ³΄μž.


λ¨Όμ € 쑰건뢀 ν™•λ₯ μΈ $p(x_t \mid z_{1:t})$λ₯Ό 베이즈 정리λ₯Ό μ΄μš©ν•΄ λΆ„λ¦¬ν•œλ‹€.

\[\begin{aligned} p(x_t \mid z_{1:t}) = \frac{p(x_t, z_{1:t})}{p(z_{1:t})} \end{aligned}\]

μ΄λ•Œ, λΆ„λͺ¨μ˜ $p(z_{1:t})$λŠ” observation에 λŒ€ν•œ ν™•λ₯ μ΄λ©°, μƒμˆ˜λΌκ³  κ°€μ •ν•œλ‹€.

λ”°λΌμ„œ, $p(x_t \mid z_{1:t})$λŠ” μ•„λž˜μ™€ κ°™λ‹€.

\[\begin{aligned} p(x_t \mid z_{1:t}) \propto p(x_t, z_{1:t}) \end{aligned}\]

이제 우리의 λͺ©ν‘œλŠ” $p(x_t \mid z_{1:t})$λ₯Ό κ΅¬ν•˜λŠ” κ²ƒμ—μ„œ $p(x_t, z_{1:t})$λ₯Ό κ΅¬ν•˜λŠ” 것이 λ˜μ—ˆλ‹€.

\[\begin{aligned} p(x_t, z_{1:t}) = \int \cdots \int \int p(x_{1:t}, z_{1:t}) \; dx_1 dx_2 \cdots dx_{t-1} \end{aligned}\]

μœ„μ˜ 식은 $p(x_t, z_{1:t})$λ₯Ό κ΅¬ν•˜λŠ” 방법에 λŒ€ν•œ κΈ°μˆ μ΄λ‹€. 식을 μ’Œμ—μ„œ 우둜 μ΄ν•΄ν•˜λŠ” κ²ƒλ³΄λ‹€λŠ” μš°μ—μ„œ 쒌둜 이해해야 ν•œλ‹€.

\[\begin{aligned} \int \cdots \int \int p(x_{1:t}, z_{1:t}) \; dx_1 dx_2 \cdots dx_{t-1} \end{aligned}\]

λ₯Ό κ°€μž₯ μ•ˆμͺ½μ— μžˆλŠ” 적뢄식뢀터 μ΄ν•΄ν•΄λ³΄μž.

\[\begin{aligned} \int p(\{ x_1, \cdots, x_{t_1}, x_t \}, z_{1:t}) \; dx_1 \end{aligned}\]

이것은 $p(\{ x_1, \cdots, x_{t_1}, x_t \}, z_{1:t})$μ—μ„œ $x_1$λ₯Ό μΆ”μΆœν•˜λŠ”(marginalize out) 과정이닀.

μ΄λ•Œ, $x_1$λŠ” $\{ x_2, \cdots, x_{t_1}, x_t \}$와 $\{ z_1, \cdots, z_t \}$에 λͺ¨λ‘ indenpendentν•˜λ―€λ‘œ

  • $x_1 \perp x_i \quad (2 \le i \le t)$
  • $x_1 \perp z_j \quad (1 \le j \le t)$
\[\begin{aligned} \int p(\{ x_1, \cdots, x_{t_1}, x_t \}, z_{1:t}) \; dx_1 = p(\{ x_2, \cdots, x_{t_1}, x_t \}, z_{1:t}) \end{aligned}\]

κ°€ λœλ‹€.

이 과정을 $x_2$λΆ€ν„° $x_{t-1}$κΉŒμ§€ λ°˜λ³΅ν•˜λ©΄, μ•„λž˜μ™€ 같은 κ²°κ³Όλ₯Ό 얻을 수 μžˆλ‹€.

\[\begin{aligned} p(x_t, z_{1:t}) = \int \cdots \int \int p(x_{1:t}, z_{1:t}) \; dx_1 dx_2 \cdots dx_{t-1} \end{aligned}\]


μ΄λ²ˆμ—λŠ” μ€‘μ²©λœ μ λΆ„μ‹μ—μ„œ μ λΆ„λ˜λŠ” ν•¨μˆ˜μΈ $p(x_{1:t}, z_{1:t})$λ₯Ό μ‚΄νŽ΄λ³΄μž.

$p(x_{1:t}, z_{1:t})$λŠ” 베이즈 정리에 μ˜ν•΄ μ•„λž˜μ™€ 같이 ν‘œν˜„λœλ‹€.

\[\begin{aligned} p(x_{1:t}, z_{1:t}) = p(z_{1:t} \mid x_{1:t}) \cdot p(x_{1:t}) \end{aligned}\]

λ”°λΌμ„œ μ€‘μ²©λœ 적뢄식을 λ‹€μ‹œ μ“°λ©΄,

\[\begin{aligned} & \int \cdots \int \int p(x_{1:t}, z_{1:t}) \; dx_1 dx_2 \cdots dx_{t-1} \\ =& \int \cdots \int \int p(z_{1:t} \mid x_{1:t}) \cdot p(x_{1:t}) \; dx_1 dx_2 \cdots dx_{t-1} \end{aligned}\]

κ°€ λœλ‹€.


이제 ν•¨μˆ˜ $p(z_{1:t} \mid x_{1:t}) \cdot p(x_{1:t})$λ₯Ό μ£Όλͺ©ν•΄λ³΄μž.

λ¨Όμ € μœ„μ˜ μ‹μ—μ„œ $p(x_{1:t})$ 뢀뢄은 probabilistic dependency에 μ˜ν•΄ μ•„λž˜μ™€ 같이 ν‘œν˜„ν•  수 μžˆλ‹€.

\[p(x_{1:t}) = p(x_t \mid x_{t-1}) \cdots p(x_2 \mid x_1) p(x_1)\]

$p(z_{1:t} \mid x_{1:t})$ λΆ€λΆ„ μ—­μ‹œ probabilistic dependency에 μ˜ν•΄ μ•„λž˜μ™€ 같이 ν‘œν˜„λœλ‹€.

\[p(z_{1:t} \mid x_{1:t}) = p(z_t \mid x_t) \cdots p(z_2 \mid x_2) p(z_1 \mid x_1)\]

μ’…ν•©ν•˜λ©΄, μ•„λž˜μ™€ κ°™λ‹€.

\[\begin{aligned} p(z_{1:t} \mid x_{1:t}) & \cdot p(x_{1:t}) \\ = \left( p(x_t \mid x_{t-1}) \cdots p(x_2 \mid x_1) p(x_1) \right) & \cdot \left( p(z_t \mid x_t) \cdots p(z_2 \mid x_2) p(z_1 \mid x_1) \right) \end{aligned}\]

이것을 적뢄식에 λ°˜μ˜ν•˜λ©΄ μ•„λž˜μ™€ κ°™λ‹€.

\[\begin{aligned} & \int \cdots \int \int p(z_{1:t} \mid x_{1:t}) \cdot p(x_{1:t}) \; dx_1 dx_2 \cdots dx_{t-1} \\ = & \int \cdots \int \int p(z_{1:t} \mid x_{1:t}) \cdot \left( p(x_t \mid x_{t-1}) \cdots p(x_2 \mid x_1) p(x_1) \right) \cdot \left( p(z_t \mid x_t) \cdots p(z_2 \mid x_2) p(z_1 \mid x_1) \right) \cdots dx_{t-1} \end{aligned}\]

식이 μ•„μ£Όμ•„μ£Ό λ³΅μž‘ν•΄μ‘Œλ‹€ γ… γ… 


ν•˜μ§€λ§Œ, μœ„μ˜ μ€‘μ²©λœ 적뢄식은 μ˜μ™Έλ‘œ 쒋은 μ„±μ§ˆμ„ 가지고 μžˆλ‹€!!

μœ„μ˜ μ λΆ„μ‹μ—μ„œ 적뢄에 κ΄€μ—¬ν•˜μ§€ μ•ŠλŠ” λ³€μˆ˜λ₯Ό λͺ¨λ‘ λΆ„λ¦¬ν•΄λ³΄μž. 그러면 μ•„λž˜μ™€ 같은 식을 μ–»λŠ”λ‹€.

\[\begin{aligned} & \int \cdots \int \int p(z_{1:t} \mid x_{1:t}) \cdot \left( p(x_t \mid x_{t-1}) \cdots p(x_2 \mid x_1) p(x_1) \right) \cdot \left( p(z_t \mid x_t) \cdots p(z_2 \mid x_2) p(z_1 \mid x_1) \right) \cdots dx_{t-1} \\ =& \; p(z_{1:t} \mid x_{1:t}) \left( \int_{x_{t-1}} p(x_t \mid x_{t-1}) p(z_{t-1} \mid x_{t-1}) \cdots \left( \int_{x_2} p(x_3 \mid x_2)p(z_2 \mid x_2) \left( \int_{x_1} p(x_2 \mid x_1)p(z_1 \mid x_1) p(x_1) dx_1 \right) dx_2 \right) \cdots dx_{t-1} \right) \end{aligned}\]

식이 더 κΈΈμ–΄μ‘Œλ‹€ γ… γ… 


μ€‘μ ‘λœ 적뢄을 ν’€κΈ° μœ„ν•΄ κ°€μž₯ μ•ˆμͺ½μ—μ„œλΆ€ν„° μ‹œμž‘ν•΄λ³΄μž.

\[\int_{x_1} p(x_2 \mid x_1) p(z_1 \mid x_1) p(x_1) \; dx_1\]

베이즈 정리에 μ˜ν•΄ $p(z_1 \mid x_1) p(x_1) = \dfrac{p(z_1, x_1)}{p(x_1)} p(x_1) = p(z_1, x_1)$이닀.

μ΄λ•Œ, $p(z_1, x_1) = \dfrac{p(x_1 \mid z_1)}{p(z_1)}$인데, $p(z_1)$이 μƒμˆ˜μ΄λ―€λ‘œ $p(z_1, x_1) \propto p(x_1 \mid z_1)$이닀.

μ •λ¦¬ν•˜λ©΄,

\[p(z_1 \mid x_1) p(x_1) = p(z_1, x_1) \propto p(x_1 \mid z_1)\]

λ”°λΌμ„œ

\[\begin{aligned} & \int_{x_1} p(x_2 \mid x_1) p(z_1 \mid x_1) p(x_1) \; dx_1 \\ \propto& \int_{x_1} p(x_2 \mid x_1) p(x_1 \mid z_1) \; dx_1 \end{aligned}\]

μ΄λ•Œ, $p(x_2 \mid x_1) p(x_1 \mid z_1)$λ₯Ό 잘 μ •λ¦¬ν•˜μ—¬ $p(x_2, x_1 \mid z_1)$λ₯Ό μœ λ„ν•  수 μžˆλ‹€.

\[\begin{aligned} p(x_2, x_1 \mid z_1) &= \frac{p(x_2 \cap x_1 \cap z_1)}{p(z_1)} \\ &= \frac{p(x_2 \mid (x_1 \cap z_1)) \cdot p(x_1 \cap z_1)}{p(z_1)} \\ &= p(x_2 \mid x_1) \cdot \frac{p(x_1 \cap z_1)}{p(z_1)} \\ &= p(x_2 \mid x_1) \cdot p(x_1 \mid z_1) \end{aligned}\]

λ”°λΌμ„œ 적뢄식을 μ•„λž˜μ™€ 같이 λ°”κΏ€ 수 μžˆλ‹€.

\[\begin{aligned} & \int_{x_1} p(x_2 \mid x_1) p(x_1 \mid z_1) \; dx_1 \\ =& \int_{x_1} p(x_2, x_1 \mid z_1) \; dx_1 \end{aligned}\]

적뢄식을 κ³„μ‚°ν•˜λ©΄, $x_1$이 marginalize outλœλ‹€.

\[\begin{aligned} & \int_{x_1} p(x_2, x_1 \mid z_1) \; dx_1 \\ =& \, p(x_2 \mid z_1) \end{aligned}\]


κ°€μž₯ μ•ˆμͺ½ μ λΆ„μ—μ„œ 얻은 κ²°κ³Όλ₯Ό ν™œμš©ν•΄ κ·Έ λ‹€μŒ 적뢄을 ν•΄κ²°ν•΄λ³΄μž.

\[\int_{x_2} p(x_3 \mid x_2) p(z_2 \mid x_2) p(x_2 \mid z_1) \; dx_2\]

μ΄λ²ˆμ—λŠ” $p(z_2 \mid x_2) \cdot p(x_2 \mid z_1)$λ₯Ό λ³€ν˜•ν•œλ‹€.

\[\begin{aligned} & p(z_2 \mid x_2) \cdot p(x_2 \mid z_1) \\ =& \; p(z_2 \mid x_2) \cdot \frac{p(x_2 \cap z_1)}{p(z_1)} \\ =& \; \frac{p(z_2 \mid (x_2 \cap z_1)) \cdot p(x_2 \cap z_1)}{p(z_1)} \\ =& \; \frac{p(z_2 \cap x_2 \cap z_1)}{p(z_1)}\\ =& \; \frac{p(x_2 \cap (z_1 \cap z_2))}{p(z_1 \cap z_2)}\\ =& \; p(x_2 \mid z_1, z_2) \end{aligned}\]

λ”°λΌμ„œ λ‘λ²ˆμ§Έ 적뢄식은 μ•„λž˜μ™€ κ°™λ‹€.

\[\begin{aligned} & \int_{x_2} p(x_3 \mid x_2) p(z_2 \mid x_2) p(x_2 \mid z_1) \; dx_2 \\ =& \int_{x_2} p(x_3 \mid x_2) p(x_2 \mid z_1, z_2) \; dx_2 \\ =& \; p(x_3 \mid z_1, z_2) \end{aligned}\]


μœ„μ™€ 같은 λ°©μ‹μœΌλ‘œ λ‚΄λΆ€μ˜ 적뢄값을 μ΄μš©ν•΄ 적뢄을 계속 μ§„ν–‰ν•˜λ©΄, μš°λ¦¬κ°€ μ–»κ³ μž ν•˜λŠ” $p(x_t \mid z_{1:t})$λ₯Ό 얻을 수 μžˆλ‹€.


$p(x_t \mid z_{1:t})$λ₯Ό μΆ”μ •ν•˜λŠ” Sequential Density Estimation은 λ‚΄λΆ€μ—μ„œ 얻은 값을 ν™œμš©ν•΄ λ‹€μŒ μ λΆ„μ˜ 값을 κ΅¬ν•˜λŠ” κ³Όμ •μœΌλ‘œ μ§„ν–‰λ˜κΈ° λ•Œλ¬Έμ—, μΌμ’…μ˜ μž¬κ·€(recursion)이라고 λ³Ό 수 μžˆλ‹€.