์—˜๋ผ์Šคํ‹ฑ์„œ์น˜ ๋‚ด๋ถ€ ํƒํ—˜: ๋ฃจ์”ฌ ์„ธ๊ทธ๋จผํŠธ๋กœ ์ดํ•ดํ•˜๋Š” ๊ฒ€์ƒ‰ ์—”์ง„์˜ ์„ฌ์„ธํ•œ ๋ฉ”์ปค๋‹ˆ์ฆ˜ ๐Ÿ”ฆ

6 minute read

ElasticSearch์—์„œ ์ƒค๋“œ(Shard)๋ฅผ ๊ตฌ์„ฑํ•˜๋Š” ๋ฃจ์”ฌ(Lucene) Index์™€ ์—ญ์ƒ‰์ธ(Inverted Index) ๊ตฌ์กฐ์™€ ๋ฌธ์„œ ๊ฒ€์ƒ‰ ๊ธฐ๋Šฅ์˜ ๊ตฌํ˜„์ฒด์ธ Lucene Segment์— ๋Œ€ํ•ด ์‚ดํŽด๋ณด์ž. Lucene Segment๋ฅผ ์ดํ•ดํ–ˆ๋‹ค๋ฉด, ElasticSearch ๋™์ž‘์˜ ํ•ต์‹ฌ์„ ์ดํ•ดํ•œ ๊ฒƒ์ด๋‹ค!

ElasticSearch Index์˜ ๊ตฌ์กฐ

ElasticSearch Index๋Š” ์—ฌ๋Ÿฌ ์ƒค๋“œ(Shard)๋กœ ๋‚˜๋ˆ ์ง„๋‹ค. ์ƒค๋“œ๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ๋‚˜๋ˆˆ ์ผ์ข…์˜ ํŒŒํ‹ฐ์…˜(Partition)์ด๋‹ค.

ํ•˜๋‚˜์˜ ES ์ƒค๋“œ๋Š” ํ•˜๋‚˜์˜ Lucene Index๋ฅผ ๊ฐ€์ง„๋‹ค. ์‚ฌ์‹ค ES ์ƒค๋“œ๋Š” Lucene Index๋ฅผ ํ™•์žฅํ•œ ๊ฒƒ์ด๋‚˜ ๋‹ค๋ฆ„ ์—†๋‹ค. ๊ฑฐ์˜ ๋น„์Šทํ•œ ์กด์žฌ๋ผ๊ณ  ๋ณด๋ฉด ๋œ๋‹ค!

ํ•˜๋‚˜์˜ Lucene Index๋Š” ์—ฌ๋Ÿฌ Lucene Segment๋กœ ๊ตฌ์„ฑ๋œ๋‹ค. ์š” Lucene Segment ์•ˆ์— ES Document์™€ ์—ญ์ƒ‰์ธ ๊ตฌ์กฐ๊ฐ€ ์กด์žฌํ•˜๋Š” ๊ฒƒ์ด๋‹ค.

Lucene Index์™€ Seegment

Lucene์€ ์ด์ƒ์ ์ธ ์ค€-์‹ค์‹œ๊ฐ„(Near-realtime) ๊ฒ€์ƒ‰ ๊ธฐ๋Šฅ์„ ์ œ๊ณตํ•˜๊ธฐ ์œ„ํ•ด Lucene Index์™€ Lucene Segment ์ž๋ฃŒ๊ตฌ์กฐ๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค. Luncene Index์™€ Segment์—์„œ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฒ€์ƒ‰/์ƒ์„ฑ/์„์ œ/๋ณ€๊ฒฝ ๋กœ์ง์„ ์‚ดํŽด๋ณด์ž.

class LuSegment:
  self.documents = [Document(1), Document(2), ...]
  self.inverted_index = InvertedIndex()

class LuIndex:
  self.segments = [Segment(1), Segment(2), ...]

๋ฌธ์„œ ๊ฒ€์ƒ‰

Lucene Index์˜ ๊ฒ€์ƒ‰์€ ์ธ๋ฑ์Šค๊ฐ€ ๊ฐ€์ง„ N๊ฐœ์˜ Lucene Segment์—์„œ ๊ฒ€์ƒ‰ํ•ด์„œ ๊ทธ ๊ฒฐ๊ณผ๋ฅผ ์กฐํ•ฉํ•œ ๊ฒƒ์ด๋‹ค. ์ฝ”๋“œ๋กœ ๋‚˜ํƒ€๋‚ด๋ฉด ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

class LuSegment:
  self.documents = [Document(1), Document(2), ...]
  self.inverted_index = InvertedIndex()

  def search(self, qry: str):
    doc_idx = inverted_index(qry)
    return documents[doc_idx]


class LuIndex:
  self.segments = [Segment(1), Segment(2), ...]

  def search(self, qry: str):
    qry_ret = []
    for segment in segments:
      ret = segment.search(qry)
      qry_ret.append(ret)

    return qry_ret

๋ฌผ๋ก  ์œ„์˜ ์ฝ”๋“œ๋Š” ์ดํ•ด๋ฅผ ์œ„ํ•ด Lucene Index์™€ Segment์˜ ๊ฒ€์ƒ‰์„ ๋‹จ์ˆœํ™” ํ•œ ๊ฒƒ์ด๋‹ค. ๊ฐ Segment์—์„œ์˜ ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ๋ฅผ ์ทจํ•ฉํ•˜๋Š” ๊ฒƒ๋„ ๋‹จ์ˆœํžˆ .append() ํ•˜์ง„ ์•Š์„ ๊ฒƒ์ด๋‹ค.

์ด๋ ‡๊ฒŒ Lucene Indexd์—์„œ Segment ๋‹จ์œ„๋กœ ๊ฒ€์ƒ‰ํ•˜๊ณ  ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ๋ฅผ ์ทจํ•ฉํ•˜๋Š” ๋ฐฉ์‹์„ โ€œ์„ธ๊ทธ๋จผํŠธ ๋‹จ์œ„ ๊ฒ€์ƒ‰(Per-Segment Search)โ€์ด๋ผ๊ณ  ํ•œ๋‹ค.

๋ฌธ์„œ ์ƒ์„ฑ

์ƒˆ๋กœ์šด Document๋ฅผ ์ถ”๊ฐ€ํ•˜๋ฉด Lucene Index๋Š” ์ƒˆ๋กœ์šด Lucene Segment๋ฅผ ๋งŒ๋“ค์–ด ์ €์žฅํ•ด๋‘”๋‹ค.

class LuSegment:
  self.documents = [Document(1), Document(2), ...]
  self.inverted_index = InvertedIndex()

  def __init__(self, document: Document):
    self.documents = [document]
    self.inverted_index = InvertedIndex(document)


class LuIndex:
  self.segments = [Segment(1), Segment(2), ...]

  def insert(self, document: Document):
    newSegment = Segment(document)
    self.segments.append(newSegment)


๊ทธ๋Ÿฌ๋‚˜ Document๋ฅผ ์ถ”๊ฐ€๋กœ ์„ธ๊ทธ๋จผํŠธ๊ฐ€ ์ƒ์„ฑ๋˜๊ธฐ๋งŒ ํ•œ๋‹ค๋ฉด Segment๋Š” ๋Š˜ ํ•˜๋‚˜์˜ Document๋งŒ ๊ฐ€์ง€๊ฒŒ ๋œ๋‹ค. ๊ทธ๋ž˜์„œ Lucene Index๋Š” ์ฃผ๊ธฐ์ ์œผ๋กœ โ€œMergeโ€ ์ž‘์—…์„ ํ†ตํ•ด Segment๋ฅผ ๋ณ‘ํ•ฉํ•œ๋‹ค.

class LuSegment:
  self.documents = [Document(1), Document(2), ...]
  self.inverted_index = InvertedIndex()

  def __init__(self, seg1: Segment, seg2: Segment):
    self.documents = seg1.documents + seg2.documents
    self.inverted_index = InvertedIndex(self.documents)


class LuIndex:
  self.segments = [Segment(1), Segment(2), ...]

  def merge(self):
    seg1 = segments[0]
    seg2 = segments[1]

    newSegment = Segment(seg1, seg2)
    segments.push(newSegment)

    del seg[0:2]

Lucene Index์˜ Segment ๋‘˜์„ ๊ณจ๋ผ ์ƒˆ๋กœ์šด Segment๋ฅผ ์ƒ์„ฑํ•œ๋‹ค. ๋‘ Segment๊ฐ€ ํ•ฉ์น˜๋ฉด, ๊ฒ€์ƒ‰์—์„œ Lucene Index๊ฐ€ ํƒ์ƒ‰ํ•  Segment ์ˆ˜๊ฐ€ ์ค„์–ด๋“ ๋‹ค!

๋ฌผ๋ก  ์œ„์˜ ์ฝ”๋“œ๋Š” ์ดํ•ด๋ฅผ ์œ„ํ•ด Merge ์ž‘์—…์€ ๋‹จ์ˆœํ™” ํ•œ ๊ฒƒ์ด๋‹ค! ์‹ค์ œ๋ก  ๋ณ‘ํ•ฉํ•  ๋‘ Segment๋ฅผ ์„ ํƒํ•˜๋Š” ๋ฐฉ์‹๋„ ๋ณต์žกํ•˜๋ฉฐ, Merge ๊ณผ์ • ์ค‘์—๋Š” ์‚ญ์ œ ํ‘œ์‹œ๋œ ๋ฌธ์„œ์˜ โ€œ๋ฌผ๋ฆฌ์  ์‚ญ์ œโ€๋„ ์ด๋ค„์ง„๋‹ค!

๋ฌธ์„œ ์‚ญ์ œ

Lucene Index์—์„œ Document์™€ Lucene Segment๋Š” ๋ถˆ๋ณ€์„ฑ(immutability)๋ฅผ ๊ฐ€์ง„๋‹ค. ์ด๊ฒƒ์€ ๋ฌธ์„œ์— ๋Œ€ํ•œ ์‚ญ์ œ ์š”์ฒญ์ด ๋ฐœ์ƒํ•ด๋„ ํ•ด๋‹น Document๋ฅผ ์‹ค์ œ๋กœ ๋ฌผ๋ฆฌ์  ๊ณต๊ฐ„์—์„œ ์‚ญ์ œํ•˜์ง€๋Š” ์•Š๋Š”๋‹ค๋Š” ๋ง์ด๋‹ค! ๋‹ค๋งŒ, ์œ ์ €(Client) ์ž…์žฅ์—์„  ๋ฌธ์„œ๊ฐ€ ์‚ญ์ œ๋˜์—ˆ๋‹ค๋Š” ์‘๋‹ต์€ ์ •์ƒ์ ์œผ๋กœ ๋ฐ›๋Š”๋‹ค.

๋Œ€์‹  ์‚ญ์ œ ์š”์ฒญ ์˜จ ๋ฌธ์„œ์˜ id๋ฅผ ๊ฒ€์ƒ‰ํ•ด์„œ ํ•ด๋‹น id๋ฅผ ๊ฐ€์ง„ ๋ฌธ์„œ๋“ค์— "์‚ญ์ œ๋จ"๋ผ๋Š” ํ‘œ์‹œ๋งŒ ํ•ด๋‘”๋‹ค. ๊ทธ๋Ÿผ ๋ฌผ๋ฆฌ์  ์‚ญ์ œ๋Š” ์–ธ์ œ ์ผ์–ด๋‚˜๋Š”๊ฐ€? ๋ฌผ๋ฆฌ์  ์‚ญ์ œ๋Š” Segment๊ฐ€ โ€œ๋จธ์ง€โ€๋  ๋•Œ, Segment์˜ ๋ฌธ์„œ ์ค‘์— "์‚ญ์ œ๋จ" ํ‘œ์‹œ๊ฐ€ ์žˆ๋Š” ๋ฌธ์„œ๋ฅผ ์‚ญ์ œํ•˜๋Š” ๊ฒƒ์ด๋‹ค.

class LuSegment:
  self.documents = [Document(1), Document(2), ...]
  self.inverted_index = InvertedIndex()

  def delete(self, doc_id: str):
    doc_idx = self.inverted_index(id)
    doc = self.documents[doc_idx]
    doc.is_deleted = True

  def __init__(self, seg1: Segment, seg2: Segment):
    self.documents = []
    for document in (seg1.documents + seg2.documents):
      if document.is_delete:
        continue
      self.documents.append(document)

    self.inverted_index = InvertedIndex(self.documents)

class LuIndex:
  self.segments = [Segment(1), Segment(2), ...]

  def delete(self, doc_id: str):
    for segment in segments:
      segment.delete(doc_id)

  def merge(self):
    ...

๋ฌธ์„œ ๋ณ€๊ฒฝ

ES์—์„œ ๋ฌธ์„œ์˜ ๋ณ€๊ฒฝ์€ DB์—์„œ์˜ ๋ณ€๊ฒฝ๊ณผ ๋‹ฌ๋ฆฌ Overwrite(Delete & Write)๊ฐ€ ์•„๋‹Œ ์ƒˆ๋กœ Document๋ฅผ ๋งŒ๋“  ํ›„ ๋ฌธ์„œ์˜ โ€œ๋ฒ„์ „โ€์„ ์˜ฌ๋ฆฌ๋Š” ๊ฒƒ์ด๋‹ค. ์ฆ‰, Mark Delete๋ฅผ ํ•œ ํ›„ Write๋ฅผ ํ•˜๋Š” ์…ˆ์ด๋‹ค! ๊ทธ๋ž˜์„œ ๋ฌธ์„œ๋ฅผ ๋ณ€๊ฒฝํ•˜๋Š” ๊ฒƒ์€ ์‚ฌ์‹ค Lucene Index์—์„œ ๋ฌธ์„œ๋ฅผ ์‚ญ์ œํ•˜๊ณ  ์ƒˆ๋กœ ๋ฌธ์„œ๋ฅผ ํ•˜๋‚˜ ์ƒ์„ฑํ•˜๋Š” ๊ฒƒ๊ณผ ๊ฐ™๋‹ค! ๋‹จ, ์—ฌ๊ธฐ์„œ์˜ ๋ฌธ์„œ ์‚ญ์ œ๋Š” ๋ฌผ๋ฆฌ์  ์‚ญ์ œ๊ฐ€ ์•„๋‹ˆ๋ผ Mark Delete ํ•˜๋Š” ๊ฒƒ์„ ๋งํ•œ๋‹ค.

class LuIndex:
  self.segments = [Segment(1), Segment(2), ...]

  def update(self, doc_id: str, document: Document):
    self.delete(doc_id)
    self.insert(doc_id, document)

์„ธ๊ทธ๋จผํŠธ ๋ถˆ๋ณ€์„ฑ

์™œ Lucene์€ ์„ธ๊ทธ๋จผํŠธ ๋ถˆ๋ณ€์„ฑ(Segment Immutability)๋ฅผ ์ฑ„ํƒํ•œ ๊ฒƒ์ผ๊นŒ?

๋จผ์ € ์„ธ๊ทธ๋จผํŠธ ๋ถˆ๋ณ€์„ฑ์ด ์—†๋Š” ์ƒํ™ฉ์—์„œ ๋ฌธ์„œ๋ฅผ ์‚ญ์ œ ํ•œ๋‹ค๊ณ  ์ƒ๊ฐํ•ด๋ณด์ž. ๊ทธ๋Ÿฌ๋ฉด Segment์˜ (1) ๋ฌธ์„œ ๋ฆฌ์ŠคํŠธ์—์„œ ๋ฌธ์„œ๋ฅผ ์‚ญ์ œํ•˜๊ณ  (2) Inverted Index๋ฅผ ๊ฐฑ์‹ ํ•˜๋Š” ๋‘ ๊ณผ์ •์„ ๋‹ค์‹œ ์ˆ˜ํ–‰ํ•ด์ค˜์•ผ ํ•œ๋‹ค.

๊ทธ๋Ÿฐ๋ฐ Segment๊ฐ€ ์ˆ˜์ • ํ•˜๋Š” ์ค‘์ธ๋ฐ, ํ•ด๋‹น Segment์— ๊ฒ€์ƒ‰ ์š”์ฒญ์ด ์˜ค๋Š” ๊ฒฝ์šฐ๋ฅผ ์ƒ๊ฐํ•ด๋ณด์ž. ๊ทธ๋Ÿฌ๋ฉด ๊ฒ€์ƒ‰ ์ž‘์—… ์ž…์žฅ์—์„œ๋Š” 2๊ฐ€์ง€ ์„ ํƒ์ง€๊ฐ€ ์žˆ๋Š”๋ฐ

  1. ์ˆ˜์ • ์ค‘์ธ Segment๋Š” ์Šคํ‚ตํ•œ๋‹ค.
  2. Segment ์ˆ˜์ •์ด ๋๋‚  ๋•Œ๊นŒ์ง€ ๊ธฐ๋‹ค๋ฆฐ๋‹ค.

1๋ฒˆ ๋ฐฉ์‹์„ ์ฑ„ํƒํ•˜๋ฉด, ์ˆ˜์ • ์ค‘์ธ Segment์— ์žˆ๋Š” Document๊ฐ€ ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ์—์„œ ์Šคํ‚ต ๋˜๊ธฐ ๋•Œ๋ฌธ์— ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ๊ฐ€ ๋ถ€์ •ํ™• ํ•ด์ง„๋‹ค. 2๋ฒˆ ๋ฐฉ์‹์„ ์ฑ„ํƒํ•˜๋ฉด, ๋ฌธ์„œ ์‚ญ์ œ ์š”์ฒญ ์ดํ›„์— ํŠธ๋ฆฌ๊ฑฐ๋œ ๋ชจ๋“  ๊ฒ€์ƒ‰์ด Segment ์ˆ˜์ •์ด ์™„๋ฃŒ๋  ๋•Œ๊นŒ์ง€ pending ๋˜์–ด๋ฒ„๋ฆฐ๋‹ค!

๊ฒฐ๊ตญ, Segment๋ฅผ ์ˆ˜์ •ํ•˜๋Š” ์ž‘์—… ์ž์ฒด๊ฐ€ ๊ฒ€์ƒ‰ ์—”์ง„์—๊ฒŒ๋Š” ๋ถ€๋‹ด์Šค๋Ÿฌ์šด ์ž‘์—…์ด๋ผ๋Š” ๋ง์ด๋‹ค!

๊ฒฐ๊ตญ Lucene์€ ์„ธ๊ทธ๋จผํŠธ ๋‚ด์˜ ๋ฌธ์„œ์— ๋ณ€๊ฒฝ/์‚ญ์ œ ์ž‘์—…์ด ์ผ์–ด๋‚˜๋ฉด ์ผ๋‹จ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ๋ฌธ์„œ๋ฅผ "์‚ญ์ œ๋จ"์œผ๋กœ ํ‘œ์‹œํ•ด๋‘๊ณ , ์ƒˆ๋กœ์šด ๋ฒ„์ „์˜ ๋ฌธ์„œ๊ฐ€ ๋‹ด๊ธด Segment๋ฅผ ์ƒ์„ฑํ•˜๊ฒŒ ๋œ ๊ฒƒ์ด๋‹ค!