์ด ํฌ์ŠคํŠธ๋Š” ๋ฐฑ์ค€ 15484๋ฒˆ: ์ตœ์†ŒํŽธ์ง‘2์™€ 2326๋ฒˆ: ์ตœ์†Œ ํŽธ์ง‘ ๋ฌธ์ œ2์—์„œ ์“ฐ๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์ธ <Damerauโ€“Levenshtein distance>์— ๋Œ€ํ•ด ์†Œ๊ฐœํ•˜๋Š” ํฌ์ŠคํŠธ์ž…๋‹ˆ๋‹ค. ์ง€์ ์€ ์–ธ์ œ๋‚˜ ํ™˜์˜์ž…๋‹ˆ๋‹ค :)

7 minute read

์ด ํฌ์ŠคํŠธ๋Š” ๋ฐฑ์ค€ 15484๋ฒˆ: ์ตœ์†ŒํŽธ์ง‘2์™€ 2326๋ฒˆ: ์ตœ์†Œ ํŽธ์ง‘ ๋ฌธ์ œ2์—์„œ ์“ฐ๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์ธ <Damerauโ€“Levenshtein distance>์— ๋Œ€ํ•ด ์†Œ๊ฐœํ•˜๋Š” ํฌ์ŠคํŠธ์ž…๋‹ˆ๋‹ค. ์ง€์ ์€ ์–ธ์ œ๋‚˜ ํ™˜์˜์ž…๋‹ˆ๋‹ค :)


Damerauโ€“Levenshtein distance

<ํŽธ์ง‘ ๊ฑฐ๋ฆฌ; edit distance>๋Š” ๋Œ€ํ‘œ์ ์ธ DP ๋ฌธ์ œ์ด๋‹ค. ๋ฐฑ์ค€์—์„œ๋Š” 15483๋ฒˆ: ์ตœ์†ŒํŽธ์ง‘ ๋ฌธ์ œ๋ฅผ ํ†ตํ•ด <edit distance>์— ๋Œ€ํ•œ ๋ฌธ์ œ๋ฅผ ํ’€ ์ˆ˜ ์žˆ๋‹ค.

๋จผ์ € <edit distance>์˜ ์ƒํ™ฉ์„ ์‚ดํŽด๋ณด๋ฉด, ๋‘ String s1, s2์— ๋Œ€ํ•ด

  1. ์‚ฝ์ž…(Insertion)
  2. ์‚ญ์ œ(Deletion)
  3. ๊ต์ฒด(Replacement)

์„ธ ๊ฐ€์ง€ ์—ฐ์‚ฐ์„ ํ†ตํ•ด ๋‘ ๋ฌธ์ž๋ฅผ ๊ฐ™๊ฒŒ ๋งŒ๋“ค ์ˆ˜ ์žˆ๋Š” ์ตœ์†Œ ํšŸ์ˆ˜๋ฅผ ๊ณ„์‚ฐํ•œ๋‹ค.

๊ทธ๋Ÿฐ๋ฐ, ์ตœ์†ŒํŽธ์ง‘ ๋ฌธ์ œ์˜ ๋‹ค์Œ ๋ฌธ์ œ์ธ ์ตœ์†ŒํŽธ์ง‘2์—์„œ๋Š” ํ•˜๋‚˜์˜ ์—ฐ์‚ฐ์ด ๋” ์ถ”๊ฐ€๋œ๋‹ค.

4. ๊ตํ™˜(Transposition with adjacent character): ๋‘ ์ธ์ ‘ํ•œ ๊ธ€์ž์˜ ์œ„์น˜๋ฅผ ์„œ๋กœ ๋ฐ”๊ฟ€ ์ˆ˜ ์žˆ๋‹ค.

๋ฌธ์ œ๋ฅผ ๋ณด๊ณ , ์ฒ˜์Œ์—๋Š” ๊ฝค ํ• ๋งŒ ํ•˜๋‹ค๊ณ  ์ƒ๊ฐํ•ด์„œ ๋‹จ์ˆœํžˆ โ€œ๊ตํ™˜ํ–ˆ์„ ๋•Œ matching cost๊ฐ€ 0์ด ๋˜๊ณ , ๋˜ ๊ทธ๋•Œ์˜ DP ๊ฐ’์ด ์ž‘๋‹ค๋ฉด, Transpotion!โ€์ธ ์‹์œผ๋กœ 4๋ฒˆ์งธ ์—ฐ์‚ฐ์„ ์ฒ˜๋ฆฌํ•ด๋ดค๋Š”๋ฐ, WA๋ฅผ ๋ฐ›์•˜์—ˆ๋‹ค.

๊ฒ€์ƒ‰ํ•ด๋ณด๋‹ˆ, โ€˜Transpotionโ€™ ์—ฐ์‚ฐ์€ ์•ž์˜ ๊ฒฝ์šฐ๋ž‘์€ ๋‹ค๋ฅด๊ฒŒ ์‰ฝ๊ฒŒ ๊ณ„์‚ฐ๋˜์ง€ ์•Š๊ณ , <Damerauโ€“Levenshtein distance>1๋ฅผ ํ†ตํ•ด ๊ตฌํ•ด์•ผ ํ•œ๋‹ค.

<DL-distance>๋Š” ์˜คํƒ€ ์ˆ˜์ •(misspell correction)์—์„œ ํ™œ์šฉํ•˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด๋‹ค. DL-distance์—์„œ ์“ฐ๋Š” 4๊ฐ€์ง€ ๊ฒฝ์šฐ๋Š” ์šฐ๋ฆฌ๊ฐ€ ํ”ํžˆ ํ•˜๋Š” ์˜คํƒ€์˜ ๊ฒฝ์šฐ๋“ค์ด๋ฉฐ, ๊ตฌ๊ธ€ ๊ฒ€์ƒ‰์—์„œ ๊ฒ€์ƒ‰์„ ์ž˜๋ชป ์ž…๋ ฅํ•ด๋„ ์ ์ ˆํžˆ ์›๋ž˜์˜ ํ˜•ํƒœ๋กœ ๋ณ€ํ™˜ํ•ด์ฃผ๋Š” ๊ฒƒ ์—ญ์‹œ ์ด๋Ÿฐ DL-distance๋กœ ์œ ์‚ฌ๋„๋ฅผ ํŒ๋‹จํ•˜๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.


Algorithm

DL-distance๋ฅผ ๊ตฌํ•˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์—๋Š” <OSA distance; Optimal String Alignment distance>์™€ <DL-distance with adjacent transposition>, ๋‘ ๊ฐ€์ง€ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ์žˆ๋‹ค. ์ด๋•Œ, ํ›„์ž๋ฅผ true DL-distance๋ผ๊ณ  ์—ฌ๊ธด๋‹ค.

OSA์™€ DL ๋ชจ๋‘ transposition correction์„ ์ง€์›ํ•œ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ <OSA distanace>์˜ ๊ฒฝ์šฐ โ€œno substring is edited more than onceโ€๋ผ๋Š” ์ก”์•ฝ์ด ์žˆ๋‹ค. ๋ณธ์ธ์€ ๋Œ€์ถฉ โ€œtranspositionํ•œ substring ์‚ฌ์ด์— insertion์„ ํฌํ•จํ•œ ๋‹ค๋ฅธ ์—ฐ์‚ฐ์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์—†๋‹คโ€๋ผ๊ณ  ์ œ์•ฝ์„ ์ดํ•ดํ–ˆ๋‹ค.

์ด ์ œ์•ฝ์ด ์™œ critical ํ•œ์ง€๋Š” Wikipedia์— ์ œ์‹œ๋˜๋Š” ์˜ˆ์‹œ๋ฅผ ํ†ตํ•ด ์ดํ•ดํ•  ์ˆ˜ ์žˆ๋‹ค.

๐Ÿ‘‰ example from Wikipedia

OSA ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๊ตฌํ˜„ํ•˜๊ธฐ๋„ ์‰ฝ๊ณ , ๋˜ ๋ณธ์ธ์ด ์ฒ˜์Œ ์‹œ๋„ํ•œ ์ ‘๊ทผ์ผ ๋งŒํผ ์ง๊ด€์ ์ด๊ธฐ ๋•Œ๋ฌธ์— ์—ฌ๊ธฐ์„œ ๋”ฐ๋กœ ์„ค๋ช…ํ•˜์ง€๋Š” ์•Š๊ฒ ๋‹ค ๐Ÿ˜‰ ์ด๋ฒˆ ํฌ์ŠคํŠธ์—์„œ๋Š” true DL-distance๋งŒ ํ•œ๋ฒˆ ์ œ๋Œ€๋กœ ์‚ดํŽด๋ณด์ž!


Algorithmic Detail

์ด ๋ถ€๋ถ„์€ โ€˜๊ตฌ๋ฐํƒ€๋งˆ ๊ตฌ๋ฐํƒ€๋งˆโ€™๋‹˜์˜ ํฌ์ŠคํŠธ์˜ ํฌ์ŠคํŠธ๋ฅผ ๋งŽ์ด ์ฐธ๊ณ ํ–ˆ๋‹ค. (๋‹จ, ๊ตฌ๋ฐํƒ€๋งˆ ๋‹˜๊ณผ๋Š” ๋‹ค๋ฅด๊ฒŒ, Wikipedia์— ์ œ์‹œ๋œ DL-distance ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ๊ทธ๋Œ€๋กœ ์ดํ•ดํ•˜๊ณ  ํ•ด์„ํ•ด์„œ ๊ธฐ์ˆ ํ•˜์˜€๋‹ค.)

๋จผ์ €, ์‚ฝ์ž…, ์‚ญ์ œ, ๊ต์ฒด์— ๋Œ€ํ•œ ๋ถ€๋ถ„์€ ๊ธฐ์กด์˜ <edit distance>์™€ ๋™์ผํ•˜๋‹ค.

์ถ”๊ฐ€๋œ transposition ์—ฐ์‚ฐ์€ ์•„๋ž˜์™€ ๊ฐ™์ด ์ˆ˜ํ–‰๋œ๋‹ค. s1 = "ca", s2="abc"์˜ ์˜ˆ์‹œ๋ฅผ ํ†ตํ•ด ์‚ดํŽด๋ณด์ž.

๋จผ์ €, ํ˜„์žฌ i=2, j=3๋ผ๊ณ  ํ•˜์ž. ์ฆ‰, s1[i=2] = 'a', s2[j=3]='c'์ธ ์ƒํ™ฉ์ด๋‹ค.

์ด๋•Œ, s2์—์„œ j=3๋ณด๋‹ค ์ž‘์œผ๋ฉด์„œ, s1[i=2]='a'์™€ ๊ฐ™์€ ๋ฌธ์ž๋ฅผ ๊ฐ–๋Š” ๊ณณ์˜ ์œ„์น˜๋ฅผ l์ด๋ผ๊ณ  ํ•˜์ž! ๋ฐ˜๋Œ€๋กœ s1์—์„œ i=2๋ณด๋‹ค ์ž‘์œผ๋ฉด์„œ, s2[j=3]='c'์™€ ๊ฐ™์€ ๋ฌธ์ž๋ฅผ ๊ฐ–๋Š” ๊ณณ์˜ ์œ„์น˜๋ฅผ k๋ผ๊ณ  ํ•˜์ž! ์œ„์˜ ์˜ˆ์‹œ์—์„œ๋Š” l=0, k=0์ด๋‹ค.

์ด๋•Œ, transposition์€ 3๋‹จ๊ณ„๋ฅผ ํ†ตํ•ด ์ˆ˜ํ–‰๋œ๋‹ค.

  1. s1[k]์™€ s1[i] ์‚ฌ์ด๋ฅผ ์ „๋ถ€ ์—†์• ๊ณ  (cost = i-k-1)
  2. s1[k]์™€ s1[i]๋ฅผ ๊ตํ™˜ํ•˜๊ณ  (cost = 1)
  3. s2[l]๊ณผ s2[j] ์‚ฌ์ด์— ์žˆ๋Š” ๋ชจ๋“  ๋ฌธ์ž๋ฅผ ์‚ฝ์ž… (cost = j-l-1)

๋”ฐ๋ผ์„œ, transposition์˜ ๋น„์šฉ์€ ์ดํ•ฉ (i-k-1) + 1 + (j-l-1)์ด ๋œ๋‹ค!

์ด๊ฒƒ์„ ์ฝ”๋“œ๋กœ ํ‘œํ˜„ํ•˜๋ฉด,

dp[i][j] = min(insertion, deletion, replacement, dp[k][l] + (i-k-1) + 1 + (j-1-1))

์ด ๋˜๋Š” ๊ฒƒ์ด๋‹ค


์ž! ์ด์ œ ์ฝ”๋“œ ๋ ˆ๋ฒจ์—์„œ ์‚ดํŽด๋ณด์ž. ์ฝ”๋“œ๋Š” Wikipedia์˜ pseudo-code๋ฅผ ์ดํ•ดํ•˜๊ณ  ๊ทธ๋Œ€๋กœ C++๋กœ ์˜ฎ๊ฒจ ์ž‘์„ฑํ•˜์˜€๋‹ค. ์ฝ”๋“œ์—์„œ k, l๋ฅผ ๊ณ„์‚ฐํ•˜๊ธฐ ์œ„ํ•ด da[], db๋ฅผ ๋„์ž…ํ•˜๋Š” ํ…Œํฌ๋‹‰์„ ์‚ฌ์šฉํ–ˆ๋‹ค๋Š” ์ ๋„ ์ฃผ๋ชฉํ• ๋งŒ ํ•˜๋‹ค ๐Ÿคฉ

์žฌ๊ท€ ์—†์ด Button-Up ๋ฐฉ์‹์œผ๋กœ ๊ตฌํ˜„ํ–ˆ๋‹ค.

#include <bits/stdc++.h>
#define MAX 1005

using namespace std;

int dp[MAX][MAX];
int da[26]; // s1์—์„œ ํ˜„์žฌ i๋ณด๋‹ค ์ž‘์œผ๋ฉด์„œ, s2[j]์™€ ๊ฐ™์€ ๋ฌธ์ž๋ฅผ ๊ฐ–๋Š” ๊ณณ์˜ ์œ„์น˜; $k$

int Damerau_Levenshtein(string a, string b) {
  // initialization
  int maxDist = a.size() + b.size();

  dp[0][0] = maxDist;
  for (int i = 1; i <= a.size() + 1; i++) {
    dp[i][0] = maxDist;
    dp[i][1] = i - 1;
  }

  for (int j = 1; j <= b.size() + 1; j++) {
    dp[0][j] = maxDist;
    dp[1][j] = j - 1;
  }

  fill_n(da, 26, 1);

  // DP process
  for (int i = 2; i < a.size() + 2; i++) {
    // db: s2์—์„œ ํ˜„์žฌ j๋ณด๋‹ค ์ž‘์œผ๋ฉด์„œ, s1[i]์™€ ๊ฐ™์€ ๋ฌธ์ž๋ฅผ ๊ฐ–๋Š” ๊ณณ์˜ ์œ„์น˜; $l$
    int db = 1;
    for (int j = 2; j < b.size() + 2; j++) {
      int k = da[b[j - 2] - 'a'];
      int l = db;
      int cost;
      if (a[i - 2] == b[j - 2]) {
        cost = 0;
        db = j; // ํ˜„์žฌ s[i]์™€ ๊ฐ™์€ ๊ณณ์˜ ์œ„์น˜๋ฅผ db์— ๊ธฐ๋ก!
      } else {
        cost = 1;
      }
      dp[i][j] = min(dp[i - 1][j - 1] + cost,
                     min(dp[i][j - 1] + 1,
                         min(dp[i - 1][j] + 1,
                             dp[k - 1][l - 1] + (i - k - 1) + 1 + (j - l - 1))));
    }
    da[a[i - 2] - 'a'] = i;
  }

  return dp[a.size() + 1][b.size() + 1];
}

int main() {
  string s1, s2;
  cin >> s1;
  cin >> s2;

  cout << Damerau_Levenshtein(s1, s2);

  return 0;
}

๋””๋ฒ„๊ทธ๋ฅผ ์œ„ํ•ด ์ถœ๋ ฅํ•จ์ˆ˜๋„ ๋งŒ๋“ค์–ด ๋’€๋‹ค.

void printDP(int N, int M) {
  for (int i = 0; i < N; i++) {
    for (int j = 0; j < M; j++) {
      printf("%d ", dp[i][j]);
    }
    printf("\n");
  }
  printf("da: ");
  for (int k = 0; k < 26; k++) {
    printf("%d ", da[k]);
  }
  printf("\n");
  printf("==========================\n");
}

!! 15484๋ฒˆ: ์ตœ์†ŒํŽธ์ง‘2๋ฅผ ํ•ด๊ฒฐํ–ˆ๋‹ค!

์œ„์˜ ์ฝ”๋“œ์—์„œ ์•ฝ๊ฐ„๋งŒ ์ˆ˜์ •ํ•˜๋ฉด, 2326๋ฒˆ: ์ตœ์†Œ ํŽธ์ง‘ ๋ฌธ์ œ2๋„ ์‰ฝ๊ฒŒ AC๋ฅผ ๋ฐ›์„ ์ˆ˜ ์žˆ๋‹ค ใ…Žใ…Ž

์ฐธ๊ณ ์ž๋ฃŒ


  1. ๋Œ€์ถฉ [๋””๋ฉ”๋ผ์šฐ-๋ฆฌ๋ฒค์Šˆํ…Œ์ธ] ๊ฑฐ๋ฆฌ๋ผ๊ณ  ์ฝ์œผ๋ฉด ๋  ๊ฒƒ ๊ฐ™๋‹ค. ์ด๊ณณ์˜ ๋ฐœ์Œ์„ ์ฐธ๊ณ ํ–ˆ๋‹ค.ย 

Categories:

Updated: