2020-1ํ•™๊ธฐ, ๋Œ€ํ•™์—์„œ โ€˜์•Œ๊ณ ๋ฆฌ์ฆ˜โ€™ ์ˆ˜์—…์„ ๋“ฃ๊ณ  ๊ณต๋ถ€ํ•œ ๋ฐ”๋ฅผ ์ •๋ฆฌํ•œ ๊ธ€์ž…๋‹ˆ๋‹ค. ์ง€์ ์€ ์–ธ์ œ๋‚˜ ํ™˜์˜์ž…๋‹ˆ๋‹ค :)

3 minute read

2020-1ํ•™๊ธฐ, ๋Œ€ํ•™์—์„œ โ€˜์•Œ๊ณ ๋ฆฌ์ฆ˜โ€™ ์ˆ˜์—…์„ ๋“ฃ๊ณ  ๊ณต๋ถ€ํ•œ ๋ฐ”๋ฅผ ์ •๋ฆฌํ•œ ๊ธ€์ž…๋‹ˆ๋‹ค. ์ง€์ ์€ ์–ธ์ œ๋‚˜ ํ™˜์˜์ž…๋‹ˆ๋‹ค :)


๋‘ string ์‚ฌ์ด์˜ ๊ฑฐ๋ฆฌ๋ฅผ ์ธก์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ์žˆ์„๊นŒ? ๊ฐ€์žฅ ์‰ฌ์šด ๋ฐฉ๋ฒ•์€ ๋‘ string์ด ๊ฐ™์€ ๋ฌธ์ž๋กœ aligned ๋˜๋„๋ก ๋งค์นญํ•ด ์–ด๊ธ‹๋‚˜๋Š” ๋ฌธ์ž์˜ ์ˆซ์ž๋กœ ๊ฑฐ๋ฆฌ๋ฅผ ๋งค๊ธธ ์ˆ˜ ์žˆ๋‹ค.

์ด๋•Œ, align ํ•˜๋Š” ๋ฐฉ์‹์— ๋”ฐ๋ผ, ๋‘ string ์‚ฌ์ด์˜ ๊ฑฐ๋ฆฌ๊ฐ€ ๋‹ฌ๋ผ์งˆ ์ˆ˜ ์žˆ๋‹ค. ๊ทธ๋ž˜์„œ ๊ฐ€๋Šฅํ•œ ๋ชจ๋“  align ๋ฐฉ์‹ ์ค‘ ๊ฐ€์žฅ ์ž‘์€ <cost>๋ฅผ <edit distance>๋ผ๊ณ  ์ •์˜ํ•œ๋‹ค.

์ฃผ์–ด์ง„ ๋‘ string ์‚ฌ์ด์˜ <edit distance>๋ฅผ ์–ด๋–ป๊ฒŒ ์ธก์ •ํ•  ์ˆ˜ ์žˆ์„๊นŒ? DP๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด, ์ •๋ง ์‰ฝ๊ฒŒ <edit distance>๋ฅผ ๊ตฌํ•  ์ˆ˜ ์žˆ๋‹ค!!


DP๋กœ ์ ‘๊ทผํ•˜๊ธฐ ์œ„ํ•ด ๋จผ์ € ๋ฌธ์ œ๋ฅผ subproblem์œผ๋กœ ๋ถ„ํ• ํ•ด์•ผ ํ•œ๋‹ค. ์ด๋•Œ, ์šฐ๋ฆฌ๊ฐ€ ๊ฐ€์žฅ ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ์ƒ๊ฐํ•ด๋ณผ ์ˆ˜ ์žˆ๋Š” ๋ถ„ํ•  ๋ฐฉ์‹์€ ๋‘ string์„ prefix๋กœ ๋ถ„ํ• ํ•˜๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค.

Given two strings, $x[1, \dots, m]$ and $y[1, \dots, n]$. A good subproblem would be to compute the optimal edit distance btw some prefixes, $x[1, \dots, i]$ and $y[1, \dots, j]$.

Let $E(i, j)$ be the optimal edit distance btw two prefixes. Then

\[E(i, j) = \min \begin{cases} E(i-1, j) + 1 \\ E(i, j-1) + 1 \\ E(i, j) + \texttt{diff}(i, j) \end{cases}\]

์•„์ด๋””์–ด๋Š” ๊ฐ„๋‹จํ•˜๋‹ค. ๋‘ prefix์— ๋Œ€ํ•ด, ๋งˆ์ง€๋ง‰ ๋ฌธ์ž๊ฐ€ ๊ณต๋ฐฑ์ผ ๋•Œ์˜ ๋‘ ๊ฒฝ์šฐ์™€ ๋งˆ์ง€๋ง‰ ๋ฌธ์ž๋ฅผ ๋งž์ท„์„ ๋•Œ์˜ ๊ฒฝ์šฐ๋ฅผ ์ข…ํ•ฉํ•ด ์ตœ์†Œ edit distance์˜ ๊ฒฝ์šฐ๋ฅผ ์ทจํ•˜๋Š” ์•„์ด๋””์–ด๋‹ค!

// initialization
for $i=0, 1, 2, \dots, m$
โ€ƒโ€ƒ$E(i, 0) = i$

for $j=0, 1, 2, \dots, n$
โ€ƒโ€ƒ$E(0, j) = j$

// dp process
for $i=1, 2, \dots, m$
โ€ƒโ€ƒfor $j=1, 2, \dots, n$
โ€ƒโ€ƒโ€ƒโ€ƒ$E(i, j) = \min \{ E(i-1, j) + 1, \; E(i, j-1) + 1, \; E(i-1, j-1) + \texttt{diff}(i, j)\}$

return $E(m, n)$

<Edit Distance> ๋ฌธ์ œ๋Š” ๋‘ string์˜ idx๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ์— 2์ฐจ์› DP๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์œ„์˜ ๊ทธ๋ฆผ์—์„œ๋„ ๋ณผ ์ˆ˜ ์žˆ๋“ฏ <Edit Distance>์—์„œ๋Š” DAG ๊ตฌ์กฐ๊ฐ€ ์žˆ๋‹ค!


<edit distance>๋Š” ๋ฌธ์ž์—ด ์‚ฌ์ด์˜ ๊ฑฐ๋ฆฌ๋ฅผ ์ •์˜ํ•˜๋Š” ์ข‹์€ ๊ธฐ์ค€์ด ๋œ๋‹ค. ๊ทธ๋ ‡๊ธฐ ๋•Œ๋ฌธ์— ๋ฌธ์ž์—ด ๋ฌธ์ œ๊ฐ€ ์ž์ฃผ ๋“ฑ์žฅํ•˜๋Š” ์ฝ”๋”ฉ ํ…Œ์ŠคํŠธ์—์„œ๋„ ๋นˆ์ถœ์ด ๋˜๋Š” ์ฃผ์ œ์ด๋ฏ€๋กœ ๋Š˜ ์ˆ™์ง€ํ•ด๋‘ฌ์•ผ ํ•˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด๋ผ๊ณ  ์ƒ๊ฐํ•œ๋‹ค ๐Ÿ˜

<edit distance>๋Š” โ€œ์‚ฝ์ž…โ€, โ€œ์‚ญ์ œโ€, โ€œ๊ต์ฒดโ€์˜ ์„ธ ๊ฐ€์ง€ ์—ฐ์‚ฐ์„ ํ†ตํ•ด ์ตœ์†Œ ํŽธ์ง‘ ๊ฑฐ๋ฆฌ๋ฅผ ๊ณ„์‚ฐํ•œ๋‹ค. ์ด๋•Œ, ์ธ์ ‘ํ•œ ๋‘ ๊ธ€์ž๋ผ๋ฆฌ ๊ตํ™˜ํ•˜๋Š” โ€œ๊ตํ™˜(transposition)โ€ ์—ฐ์‚ฐ์ด ์ถ”๊ฐ€๋˜๋Š” ๊ฒฝ์šฐ๊ฐ€ <Damerau-Levenshtein distance>๋‹ค. ๋‚ด์šฉ๊ณผ ์ฝ”๋“œ๋ฅผ ํฌ์ŠคํŠธ๋กœ ์ •๋ฆฌํ–ˆ์œผ๋‹ˆ, ๋ฌธ์ž์—ด ๊ฑฐ๋ฆฌ์— ๋Œ€ํ•ด ๋” ๊ด€์‹ฌ์ด ์žˆ๋‹ค๋ฉด, ์•„๋ž˜์˜ ํฌ์ŠคํŠธ๋ฅผ ์ถ”์ฒœํ•œ๋‹ค! ๐Ÿ˜‰

๐Ÿ‘‰ Damerau-Levenshtein distance

์ถ”์ฒœ ๋ฌธ์ œ

Categories:

Updated: