Delta Lake๋กœ ์ €์žฅ๋œ ๋ฐ์ดํ„ฐ๋ฅผ ์ตœ์ ํ™”ํ•˜๋Š” ์—ฌ๋Ÿฌ ํ…Œํฌ๋‹‰๋“ค. Compaction(์••์ถ•), Data Skipping, Z-Ordering

18 minute read

ํšŒ์‚ฌ์—์„œ Databricks๋ฅผ ํ†ตํ•ด Spark Cluster๋ฅผ ์šด์˜ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๋ณธ ๊ธ€์€ Databricks๋ฅผ ๊ธฐ์ค€์œผ๋กœ ์ž‘์„ฑํ–ˆ์Œ์„ ๋ฏธ๋ฆฌ ๋ฐํž™๋‹ˆ๋‹ค.

Compaction

Parquet ํ…Œ์ด๋ธ”์˜ ์ฟผ๋ฆฌ ์‹œ๊ฐ„์€ ํ…Œ์ด๋ธ”์˜ ์‚ฌ์ด์ฆˆ์—๋„ ์˜ํ–ฅ์„ ๋ฐ›์ง€๋งŒ, ํ…Œ์ด๋ธ”์ด ์–ผ๋งˆ๋‚˜ ์ž˜๊ฒŒ ์ชผ๊ฐœ์–ด์ ธ ์žˆ๋Š”์ง€์—๋„ ์˜ํ–ฅ์„ ๋ฐ›๋Š”๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ์ž‘์€ ํฌ๊ธฐ์˜ Parquet ํŒŒ์ผ ๋ช‡ ์ฒœ๊ฐœ๋กœ ๊ตฌ์„ฑ ๋˜์–ด ์žˆ๋Š” ๊ฒฝ์šฐ๋ผ๋ฉด ์ฟผ๋ฆฌ๊ฐ€ ๋„๋Š” ๊ฒƒ์€ ๊ฐ€๋Šฅํ•˜์ง€๋งŒ, ์ฟผ๋ฆฌ๊ฐ€ ๋งค์šฐ ๋Š๋ฆฌ๊ฒŒ ์‹คํ–‰๋œ๋‹ค๊ณ  ํ•œ๋‹ค. ์ด๊ฒƒ์€ ํ•˜๋‚˜์˜ ํ…Œ์ด๋ธ”์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์ฝ์ด ์œ„ํ•ด์„œ ๋งŽ์€ ๊ฐฏ์ˆ˜์˜ ํŒŒ์ผ์„ ๋ฆฌ์ŠคํŒ…(listing)ํ•˜๊ณ , ๋˜ ๋งŽ์€ ํŒŒ์ผ์„ ์—ด์—ˆ๋‹ค ๋‹ซ์•˜๋‹ค ํ•ด์•ผ ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์˜ค๋ฒ„ํ—ค๋“œ๊ฐ€ ๋ฐœ์ƒํ•œ๋‹ค. Delta Lake์˜ ๊ณต์‹ ๋ธ”๋กœ๊ทธ์—์„œ๋Š” ์ด๊ฒƒ์„ โ€œthe Small File Problemโ€œ๋ผ๊ณ  ๋ถ€๋ฅธ๋‹ค.

์‹ค์ œ๋กœ 30๋งŒ ๊ฑด ์ •๋„ ๋˜๋Š” ํ…Œ์ด๋ธ”์„ 1000๊ฐœ์˜ parquet ํŒŒ์ผ๋กœ repartition ํ•˜์—ฌ ์ €์žฅํ•˜์—ฌ ๋น„๊ตํ•ด๋ณด๋‹ˆ ์•„๋ž˜์™€ ๊ฐ™์€ ์ฐจ์ด๊ฐ€ ์žˆ์—ˆ๋‹ค.

rows repartition size query time
30๋งŒ 8 73 MB 9 sec
30๋งŒ 1000 100 MB 42 sec

1000๊ฐœ์˜ Parquet ํŒŒ์ผ๋กœ ๋‚˜๋ˆ ์ง„ ๊ฒฝ์šฐ๊ฐ€ ์ฟผ๋ฆฌ ์†๋„๋„ ๋Š๋ฆฌ๊ณ , ๋ฐ์ดํ„ฐ ํฌ๊ธฐ๋„ ๋” ํฐ ๋ชจ์Šต์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค.

์ด๋Ÿฐ Parquet ์กฐ๊ฐ ๋ฌธ์ œ๋ฅผ ์‰ฝ๊ฒŒ ํ•ด๊ฒฐํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ์ž˜๊ฒŒ ์ชผ๊ฐœ์ง„ Parquet ํŒŒ์ผ์„ ํ•˜๋‚˜์˜ ํฐ ํŒŒ์ผ๋กœ ๋ณ‘ํ•ฉํ•˜๋Š” โ€œCompactionโ€ ์ž‘์—…์„ ํ•ด์ฃผ๋ฉฐ๋ฉด ๋œ๋‹ค. Delta์˜ OPTIMIZE ๋ช…๋ น์–ด๋Š” ์ž‘์€ ํฌ๊ธฐ์˜ ํŒŒ์ผ์— ๋Œ€ํ•œ Compaction์„ ์ˆ˜ํ–‰ํ•œ๋‹ค.

ย  rows repartition size query time
Compaction ์ „ 30๋งŒ 1000 100 MB 42 sec
Compaction ํ›„ 30๋งŒ 1 73 MB 15 sec

Compaction ์ž‘์—…์ด ํŒŒ์ผ๋“ค์„ ์–ผ๋งŒํผ ๋ณ‘ํ•ฉํ• ์ง€๋Š” spark ์˜ต์…˜ ์ค‘, spark.databricks.delta.optimize.maxFileSize์— ์˜ํ•ด ๊ฒฐ์ •๋œ๋‹ค. ๊ธฐ๋ณธ๊ฐ’์€ 1 GB๋กœ ์„ธํŒ… ๋˜์–ด ์žˆ๋‹ค. ๊ทธ๋ž˜์„œ ์œ„์˜ ์‹คํ—˜์—์„œ 1000๊ฐœ๋กœ ๋‚˜๋ˆ ์ง„ Parquet ํŒŒ์ผ์ด ํ•˜๋‚˜์˜ Parquet ํŒŒ์ผ๋กœ ๋ณ‘ํ•ฉ๋œ ๊ฒƒ์ด๋‹ค. (Delta Lake๋Š” ๋ถ„๋ช… ์˜คํ”ˆ์†Œ์Šค์ธ๋ฐ spark.databricks.๋ผ๊ณ  ์ ํ˜€์žˆ๊ตฌ๋จผโ€ฆ?) Delta์˜ compaction ์ž‘์—…์€ ํŒŒ์ผ ์‚ฌ์ด์ฆˆ๋ฅผ evenly-balanced ํ•˜๊ฒŒ ๋ถ„๋ฐฐํ•œ๋‹ค๊ณ  ํ•œ๋‹ค. ๋งŒ์•ฝ ๋ฐ์ดํ„ฐ ํฌ๊ธฐ๊ฐ€ 1.2 GB ์˜€๋‹ค๋ฉด, ์œ„์˜ maxFileSize์— ๋”ฐ๋ผ 1 GB์™€ 0.2 GB์˜ ํŒŒ์ผ๋กœ ๋‚˜๋‰˜๋Š”๊ฒŒ ์•„๋‹ˆ๋ผ 0.6 GB ํŒŒ์ผ 2๊ฐœ๋กœ ์••์ถ• ๋œ๋‹ค.

Delta์—์„œ ๋ฐ์ดํ„ฐ๋ฅผ ์ฝ๋Š” ํ–‰์œ„๋Š” ๋ชจ๋‘ Snapshot Isolation์œผ๋กœ ๊ฒฉ๋ฆฌ ๋˜์–ด ์žˆ๊ธฐ ๋•Œ๋ฌธ์—, ๋ˆ„๊ตฐ๊ฐ€๊ฐ€ Delta Lake์—์„œ ๋ฐ์ดํ„ฐ๋ฅผ ์ฝ๋Š” ๋„์ค‘์— OPTIMIZE ์ž‘์—…์ด ์ผ์–ด๋‚˜๋„, ๋˜๋Š” OPTIMIZE ๋„์ค‘์— ๋ˆ„๊ตฐ๊ฐ€๊ฐ€ ๋ฐ์ดํ„ฐ๋ฅผ ์ฝ๋”๋ผ๋„ ์ž‘์—…์ด ์ค‘๋‹จ๋˜์ง€ ์•Š๊ณ  ๋‘˜๋‹ค ๋™์‹œ์— ์ˆ˜ํ–‰ ๋  ์ˆ˜ ์žˆ๋‹ค.

Delta์˜ Compaction ์ž‘์—…์€ ๋ฉฑ๋“ฑ์„ฑ(idempotent)์„ ๊ฐ€์ง„๋‹ค. ๋”ฐ๋ผ์„œ ๊ฐ™์€ ๊ฒฝ๋กœ, ๊ฐ™์€ ํ…Œ์ด๋ธ”์— ๋Œ€ํ•ด ๋‘ ๋ฒˆ ์ด์ƒ OPTIMIZE ์—ฐ์‚ฐ์„ ์‹คํ–‰ํ•˜๋”๋ผ๋„ ๋ฐ์ดํ„ฐ๋Š” ์˜ํ–ฅ์ด ์—†๋‹ค. ์‚ฌ์‹ค 2nd optimize ์ž‘์—…๋ถ€ํ„ฐ๋Š” ์•„๋ฌด๋Ÿฐ ์ตœ์ ํ™”๊ฐ€ ์ด๋ค„์ง€์ง€ ์•Š๋Š”๋‹ค.

If there are only a few small files, then you donโ€™t need to run OPTIMIZE. The small file overhead only starts to become a performance issue where there are lots of small files. You also donโ€™t need to run OPTIMIZE on data thatโ€™s already been compacted. If you have an incremental update job, make sure to specify predicates to only compact the newly added data.

Delta์˜ ๊ณต์‹ ๋ธ”๋กœ๊ทธ์—์„œ๋Š” โ€œthe Small File Problemโ€œ์ด ๋ฐœ์ƒํ•ด์„œ ์ฟผ๋ฆฌ ํผํฌ๋จผ์Šค์— ๋ฌธ์ œ๊ฐ€ ์ƒ๊ธฐ๋Š”๊ฒŒ ์•„๋‹ˆ๋ผ๋ฉด, ๊ตณ์ด OPTIMIZE๋ฅผ ์ž์ฃผ ๋Œ๋ฆด ํ•„์š˜ ์—†๋‹ค๊ณ  ํ•œ๋‹ค. ๋งŒ์•ฝ ๋ฐ์ดํ„ฐ๊ฐ€ ์ด๋ฏธ ์ถฉ๋ถ„ํžˆ ์••์ถ•๋œ ์ƒํ™ฉ์ด๋ผ๋ฉด OPTIMIZE ์ „ํ›„ ํฐ ์ฐจ์ด๊ฐ€ ์—†๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. ๋˜, ๋ฐ์ดํ„ฐ๊ฐ€ ์ฆ๋ถ„(incremental)ํ•˜๊ฒŒ ์ถ”๊ฐ€๋˜๋Š” ๊ฒฝ์šฐ์—๋„ ์ „์ฒด ๋ฒ”์œ„์— ๋Œ€ํ•ด ์••์ถ•ํ•  ํ•„์š” ์—†๊ณ , ์•„๋ž˜์™€ ๊ฐ™์ด ๋ฒ”์œ„๋ฅผ ์ง€์ •ํ•ด์„œ ์••์ถ•ํ•˜๋Š” ๊ฑธ ๊ถŒ์žฅํ•œ๋‹ค.

OPTIMIZE <TABLE_NAME> WHERE `date` >= '2024-01-01'

Auto Compaction

Delta Lake์— ๋ฐ์ดํ„ฐ๋ฅผ ์“ฐ๋Š” ์ž‘์—…์ด ์„ฑ๊ณตํ•˜๋ฉด, ๋ฐ”๋กœ ์งํ›„์— Compaction์„ ์ˆ˜ํ–‰ํ•ด์ฃผ๋Š” ์˜ต์…˜์ด๋‹ค. Auto-compaction์€ ์ด์ „์— ์••์ถ•๋œ ์ ์ด ์—†๋Š” ํŒŒ์ผ์— ๋Œ€ํ•ด์„œ๋งŒ ์ˆ˜ํ–‰๋œ๋‹ค๊ณ  ํ•œ๋‹ค.

๋งŒ์•ฝ ๋งค๋ฒˆ ์“ฐ๊ธฐ ์ž‘์—… ๋งˆ๋‹ค Compaction ์ž‘์—…์ด ์ผ์–ด๋‚œ๋‹ค๋ฉด, ์“ฐ๊ธฐ ์ž‘์—…์ด ๋Š๋ ค์งˆ ์ˆ˜๋„ ์žˆ์œผ๋ฏ€๋กœ ํŒŒ์ผ์ด ์–ผ๋งŒํผ ์Œ“์—ฌ์•ผ Auto-compaction์„ ์ˆ˜ํ–‰ํ• ์ง€ ๊ฒฐ์ •ํ•  ์ˆ˜ ์žˆ๋‹ค. Spark์˜ spark.databricks.delta.autoCompact.minNumFiles ์˜ต์…˜์„ ํ†ตํ•ด ๋นˆ๋„๋ฅผ ์กฐ์ •ํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ๊ธฐ๋ณธ๊ฐ’์€ 50์ด๋‹ค.

๊ธฐ๋Šฅ์„ ํ™œ์„ฑํ™” ์‹œํ‚ค๋Š” ๋ฐฉ๋ฒ•์€ 2๊ฐ€์ง€ ์ธ๋ฐ, Spark Session Config๋กœ๋„ ํ•  ์ˆ˜ ์žˆ๊ณ , Table Property๋กœ๋„ ์ œ์–ดํ•  ์ˆ˜ ์žˆ๋‹ค.

  • Table Property: delta.autoOptimize.autoCompact
  • SparkSession Config: spark.databricks.delta.autoCompact.enabled

Data Skipping

Delta Lake์—์„  ์ฟผ๋ฆฌ ์†๋„ ํ–ฅ์ƒ์„ ์œ„ํ•ด ์ฟผ๋ฆฌ์˜ WHERE ์กฐ๊ฑด์— ๋”ฐ๋ผ ์ฝ์„ ๋ฐ์ดํ„ฐ๋ฅผ ์„ ๋ณ„ ๋˜๋Š” ์Šคํ‚ต ํ•˜๋Š” ๋กœ์ง์ด ๊ตฌํ˜„๋˜์–ด ์žˆ์–ด ์–ด๋–ค ์ฟผ๋ฆฌ์—์„œ๋Š” Parquet ํŒŒ์ผ์—์„œ ์ฟผ๋ฆฌํ•˜๋Š” ๊ฒƒ๋ณด๋‹ค ๋” ๋น ๋ฅด๊ฒŒ ๋ฐ์ดํ„ฐ๋ฅผ ์กฐํšŒํ•  ์ˆ˜ ์žˆ๋‹ค.

Delta Statistics Columns

Delta์˜ Data skpping ์ •๋ณด๋Š” Delta ํ…Œ์ด๋ธ”์— ๋ฐ์ดํ„ฐ ์“ฐ๊ธฐ ์ž‘์—…์„ ํ•  ๋•Œ ์ž๋™์œผ๋กœ ์ˆ˜์ง‘ํ•œ๋‹ค. ์ด ์ •๋ณด์—๋Š” ์•„๋ž˜ ๋‚ด์šฉ๋“ค์ด ๋‹ด๊ฒจ ์žˆ๋Š”๋ฐ

  • min/max values per columns
  • null counts per columns
  • total records

์ด๋Ÿฐ ํ†ต๊ณ„ ์ •๋ณด๋Š” _delta_log/์˜ ํ•ด๋‹น ๋ฒ„์ „ ์“ฐ๊ธฐ์˜ ๋กœ๊ทธ ํŒŒ์ผ์— ์ƒˆ๋กœ ์ถ”๊ฐ€๋˜๋Š” Parquet ํŒŒ์ผ ๋ณ„๋กœ ๋‹ด๊ฒจ์„œ ์ €์žฅ ๋œ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ์“ฐ๊ธฐ ๋ฒ„์ „์ด 4 ๋ฒ„์ „์ด์—ˆ๋‹ค๋ฉด, _delta_log/000...0004.json ํŒŒ์ผ์— ์•„๋ž˜์™€ ๊ฐ™์ด ๊ธฐ๋ก๋œ๋‹ค.

// https://delta-io.github.io/delta-rs/how-delta-lake-works/architecture-of-delta-table/
{
  "add": {
    "path": "2-95ef2108-480c-4b89-96f0-ff9185dab9ad-0.parquet",
    "size": 2204,
    "partitionValues": {},
    "modificationTime": 1701740465102,
    "dataChange": true,
    "stats": "{
      \"numRecords\": 2,
      \"minValues\": {\"num\": 11, \"letter\": \"aa\"},
      \"maxValues\": {\"num\": 22, \"letter\": \"bb\"},
      \"nullCount\": {\"num\": 0, \"letter\": 0}
    }"
  }
}
{
  "remove": {
    "path": "0-62dffa23-bbe1-4496-8fb5-bff6724dc677-0.parquet",
    "deletionTimestamp": 1701740465102,
    "dataChange": true,
    "extendedFileMetadata": false,
    "partitionValues": {},
    "size": 2208
  }
}
{
  "commitInfo": {
    "timestamp": 1701740465102,
    "operation": "WRITE",
    "operationParameters": {
      "mode": "Overwrite",
      "partitionBy": "[]"
    },
    "clientVersion": "delta-rs.0.17.0"
  }
}

๊ฐ’์„ ๋ณด๋ฉด, add ์—ฐ์‚ฐ์— ๋Œ€ํ•œ ๋ถ€๋ถ„์˜ stats ํ•„๋“œ์— numRecords, minValues, maxValues, nullCount์— ๋Œ€ํ•œ ์ •๋ณด๊ฐ€ ๋‹ด๊ฒจ ์žˆ๋‹ค. ์ด๋Ÿฐ ์ •๋ณด๋“ค์„ ํ™œ์šฉํ•ด Delta๋Š” Parquet์—์„œ ์ฟผ๋ฆฌํ•˜๋Š” ๊ฒƒ๋ณด๋‹ค ๋” ๋น ๋ฅด๊ฒŒ ๊ฐ’์„ ์กฐํšŒํ•˜๊ณ  ๋ฐ˜ํ™˜ํ•ด์ค€๋‹ค.

์˜ˆ๋ฅผ ๋“ค์–ด, ์ „์ฒด ๋ฒ”์œ„์— ๋Œ€ํ•œ ํ–‰ ๊ฐฏ์ˆ˜๋ฅผ ํŒŒ์•…ํ•˜๋Š” SELECT COUNT(*) FROM <TABLE_NAME> ๊ฐ™์€ ์ฟผ๋ฆฌ๋ฅผ ์ˆ˜ํ–‰ํ•œ๋‹ค๋ฉด, delta๊ฐ€ ์ฝ๋Š” ๊ฐ Parquet ํŒŒ์ผ๋“ค์— ๋Œ€ํ•ด ๊ธฐ๋กํ•œ stats์˜ numRecords ๊ฐ’๋งŒ ๋ชจ๋‘ ๋”ํ•ด์ฃผ๋ฉด ๋œ๋‹ค. ์ฆ‰, Parquet ํŒŒ์ผ์˜ ์›๋ณธ์€ ์ „ํ˜€ ๋“ค์—ฌ๋‹ค ๋ณผ ํ•„์š”๊ฐ€ ์—†๋Š” ๊ฒƒ์ด๋‹ค.


๋‹จ, ์ด๋Ÿฐ ํ†ต๊ณ„ ์ •๋ณด๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ๊ฒƒ์€ Parquet์—์„œ์˜ ์“ฐ๊ธฐ ์ž‘์—…๊ณผ ๋น„๊ตํ•ด ์˜ค๋ฒ„ํ—ค๋“œ๊ฐ€ ๋ฐœ์ƒํ•œ๋‹ค. ๋˜, min/max๋ฅผ ๊ณ„์‚ฐํ•˜๊ธฐ ๋ถ€๋‹ด์Šค๋Ÿฌ์šด ๋ฐ์ดํ„ฐ ํƒ€์ž…๋“ค(์˜ˆ๋ฅผ ๋“ค๋ฉด textual ๋ฐ์ดํ„ฐ)์€ ์ด๋Ÿฐ ํ†ต๊ณ„ ์ •๋ณด๋ฅผ ๊ณ„์‚ฐํ•˜์ง€ ์•Š๋„๋ก ํ•˜๊ณ  ์‹ถ์„ ์ˆ˜๋„ ์žˆ๋‹ค. ๊ทธ๋Ÿด ๊ฒฝ์šฐ ์•„๋ž˜์˜ ๋‘ Spark Session Config๋ฅผ ํ™œ์šฉํ•˜์ž.

  • spark.databricks.delta.properties.defaults.dataSkippingNumIndexedCols
  • spark.databricks.delta.properties.defaults.dataSkippingStatsColumns

dataSkippingNumIndexedCols ๊ฐ’์—๋Š” stats๋ฅผ ์ˆ˜์ง‘ํ•œ ์ปฌ๋Ÿผ ๊ฐฏ์ˆ˜๋ฅผ ์ ๋Š”๋‹ค. ๊ธฐ๋ณธ๊ฐ’์€ 32๋กœ ์ฒซ 32๊ฐœ ์ปฌ๋Ÿผ์— ๋Œ€ํ•ด์„œ stats๋ฅผ ๊ณ„์‚ฐํ•œ๋‹ค. ๋งŒ์•ฝ -1์„ ์ ์–ด์ฃผ๋ฉด ๋ชจ๋“  ์ปฌ๋Ÿผ์— ๋Œ€ํ•ด stats ์ •๋ณด๋ฅผ ์ˆ˜์ง‘ํ•œ๋‹ค.

dataSkippingStatsColumns๋Š” stats๋ฅผ ์ˆ˜์ง‘ํ•  ์ปฌ๋Ÿผ ์ด๋ฆ„์„ comma-separated๋กœ ์ ์–ด์ค€๋‹ค.

๋˜๋Š” Table Property๋กœ๋„ ์ œ์–ดํ•  ์ˆ˜ ์žˆ๋Š”๋ฐ ์•„๋ž˜์˜ ๋‘ ์†์„ฑ์œผ๋กœ ์ง€์ •ํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์œ„์˜ Spark Session Config์™€ ๊ธฐ๋Šฅ์ ์œผ๋กœ ๋Œ€์‘ํ•œ๋‹ค.

  • delta.dataSkippingNumIndexedCols
  • delta.dataSkippingStatsColumns

Get Column Statistics

Delta ์“ฐ๊ธฐ์—์„œ ์ˆ˜์ง‘ํ•œ ์ปฌ๋Ÿผ์˜ Stats ์ •๋ณด๋Š” ์•„๋ž˜์˜ ์ฟผ๋ฆฌ๋ฅผ ํ†ตํ•ด์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

> DESC EXTENDED <TABLE_NAME> <COLUMN_NAME>
     info_name info_value
 -------------- ----------
       col_name       name
      data_type     string
        comment       NULL
            min       NULL
            max       NULL
      num_nulls          0
 distinct_count          2
    avg_col_len          4
    max_col_len          4
      histogram       NULL

Z-Ordering

Delta์˜ ๊ฒฝ์šฐ OPTIMIZE๋ฅผ ์ˆ˜ํ–‰ํ•  ๋•Œ, ZORDER BY๋ผ๋Š” ์ ˆ(clause)๋ฅผ ์ถ”๊ฐ€ํ•˜์—ฌ Parquet ํŒŒํ‹ฐ์…˜์˜ ๋ฐ์ดํ„ฐ๊ฐ€ ์ •๋ ฌ๋˜๋Š” ์ˆœ์„œ๋ฅผ ๊ฒฐ์ •ํ•  ์ˆ˜ ์žˆ๋‹ค. ์˜ˆ์‹œ๋ฅผ ํ†ตํ•ด ์ข€๋” ์‚ดํŽด๋ณด์ž.

// https://delta.io/blog/2023-06-03-delta-lake-z-order/
+-----+-----+------------+---+---+------+---+---+---------+
|  id1|  id2|         id3|id4|id5|   id6| v1| v2|       v3|
+-----+-----+------------+---+---+------+---+---+---------+
|id016|id046|id0000109363| 88| 13|146094|  4|  6|18.837686|
|id039|id087|id0000466766| 14| 30|111330|  4| 14|46.797328|
|id095|id078|id0000584803| 56| 92|213320|  1|  9|63.464315|
+-----+-----+------------+---+---+------+---+---+---------+

์˜ˆ๋ฅผ ๋“ค์–ด ์œ„์™€ ๊ฐ™์€ ํ˜•์‹์˜ ๋ฐ์ดํ„ฐ๊ฐ€ 1๋ฐฑ๋งŒ ํ–‰ ์ •๋„ ๋œ๋‹ค๊ณ  ๊ฐ€์ •ํ•ด๋ณด์ž. ์ด๋Ÿฐ ๋ฐ์ดํ„ฐ์— id1 ์ปฌ๋Ÿผ์˜ ํŠน์ • ๊ฐ’์„ ๊ธฐ์ค€์œผ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฝ‘์€ ์ฟผ๋ฆฌ๋ฅผ ๋Œ๋ฆฌ๊ณ ์ž ํ•œ๋‹ค.

SELECT
  COUNT(*)
FROM
  <TABLE_NAME>
WHERE
  id1 = 'id016'

๋งŒ์•ฝ ํ…Œ์ด๋ธ”์ด 100๊ฐœ์˜ Parquet ํŒŒ์ผ๋กœ ํŒŒํ‹ฐ์…”๋‹ ๋˜์–ด ์žˆ๊ณ , ๋˜ id1='id016'๋ผ๋Š” ๊ฒ‚์ด 100๊ฐœ ํŒŒ์ผ ์ด๊ณณ์ €๊ณณ์— ํฉ์–ด์ง„ ์ƒํ™ฉ์ด๋ผ๋ฉด, ์œ„์˜ ์ฟผ๋ฆฌ๋Š” 100๊ฐœ์˜ Parquet ํŒŒ์ผ์„ ๋ชจ๋‘ ํƒ์ƒ‰ํ•  ๊ฒƒ์ด๋‹ค.

๊ทธ๋Ÿฐ๋ฐ, ๋งŒ์•ฝ 100๊ฐœ์˜ Parquet์˜ ๋ฐ์ดํ„ฐ๊ฐ€ id1 ์ปฌ๋Ÿผ์„ ๊ธฐ์ค€์œผ๋กœ ์ •๋ ฌ๋˜์–ด ์ €์žฅ๋˜์–ด์„œ ์šด ์ข‹๊ฒŒ id1='id016'์˜ ๋ฐ์ดํ„ฐ๋ฅผ ํ•˜๋‚˜์˜ Parquet ํŒŒ์ผ์— ๋ชจ์„ ์ˆ˜ ์žˆ๋‹ค๋ฉด, ๊ทธ๋Ÿฐ ์ƒํ™ฉ์—์„œ ์œ„์˜ ์ฟผ๋ฆฌ๋Š” ์˜ค์ง ํ•˜๋‚˜์˜ Parquet ํŒŒ์ผ๋งŒ์„ ํƒ์ƒ‰ํ•  ๊ฒƒ์ด๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๊ทธ Parquet ํŒŒ์ผ์— ๋Œ€ํ•œ Delta Stats๋Š” ์•„๋ž˜์™€ ๊ฐ™์ด ๊ธฐ๋ก๋  ๊ฒƒ์ด๋‹ค.

์ฆ‰, ๋น„์Šทํ•œ ์„ฑ์งˆ์˜ ๋ฐ์ดํ„ฐ๋ฅผ (์—ฌ๊ธฐ์„œ๋Š” ๊ฐ™์€ id๋ฅผ ๊ฐ€์ง„) ๋ฐ์ดํ„ฐ๋ฅผ ํ•˜๋‚˜์˜ ํŒŒ์ผ์— ๋ฌถ์–ด ์ง€์—ญ์„ฑ(Locality)๋ฅผ ๊ฐ–์ถ”๋„๋ก ๋งŒ๋“œ๋Š” ์ตœ์ ํ™” ํ…Œํฌ๋‹‰์ด๋‹ค.


์ด๋ ‡๊ฒŒ Delta๋ฅผ ๊ตฌ์„ฑํ•˜๋Š” Parquet ํŒŒํ‹ฐ์…˜์„ ํŠน์ • ์ปฌ๋Ÿผ์„ ์ •๋ ฌํ•ด ์ €์žฅํ•˜๋Š” ๋ฐฉ๋ฒ•์„ Z-Ordering์œผ๋กœ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋‹ค. ํ•ด๋‹น ์ž‘์—…์€ OPTIMIZE ์ž‘์—…๊ณผ ํ•จ๊ป˜ ์ˆ˜ํ–‰๋˜๋ฉฐ ์•„๋ž˜์™€ ๊ฐ™์€ ์ฟผ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ๋œ๋‹ค.

OPTIMIZE <TABLE_NAME> ZORDER BY id1

Delta Lake Z Order

Delta์˜ ๋ธ”๋กœ๊ทธ ์•„ํ‹ฐํด์„ ๋ณด๋ฉด, ํ™•์‹คํžˆ Z-Ordering์„ ์ค€ ์ปฌ๋Ÿผ์— ๋Œ€ํ•œ ์ฟผ๋ฆฌ๊ฐ€ ๊ทธ๋ ‡์ง€ ์•Š์€ ๊ฒฝ์šฐ๋ณด๋‹ค ํ›จ์”ฌ ํฐ ํ–ฅ์ƒ์„ ๋ณด์ด๋Š” ๊ฑธ ๋ณผ ์ˆ˜ ์žˆ๋‹ค.


Z-Ordering์€ ๋‘˜์˜ ์ด์ƒ์˜ ์ปฌ๋Ÿผ์— ๋Œ€ํ•ด์„œ๋„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋‹ค. ํ•˜์ง€๋งŒ, Z-Ordering ์ปฌ๋Ÿผ ์ˆ˜๊ฐ€ ๋งŽ์•„์งˆ์ˆ˜๋ก ์—ฌ๋Ÿฌ ์ปฌ๋Ÿผ์— ๋Œ€ํ•ด ์ข€๋” ๋น ๋ฅธ ์ฟผ๋ฆฌ๋ฅผ ์–ป๊ฒ ์ง€๋งŒ, ๊ทธ๋งŒํผ OPTIMIZE ๊ณผ์ •์—์„œ ๋ฐ์ดํ„ฐ ์ •๋ ฌ์ด ๋” ํ•„์š”ํ•˜๊ณ , ๋˜, ๋ฐ์ดํ„ฐ์˜ ์ง€์—ญ์„ฑ์œผ๋กœ ์–ป๋Š” ์ด์ ์ด ์ €ํ•˜๋  ์ˆ˜๋„ ์žˆ๋‹ค. ๊ทธ๋ ‡๊ธฐ ๋•Œ๋ฌธ์— ํ•ด๋‹น ๋ฐ์ดํ„ฐ๋ฅผ ์ฟผ๋ฆฌํ•˜๋Š” ํŒจํ„ด์„ ๋ฉด๋ฐ€ํžˆ ๋ถ„์„ํ•˜๊ณ , ๊ทธ์— ๋งž์ถฐ Z-Orderingํ•  ์ปฌ๋Ÿผ์„ ๊ฒฐ์ •ํ•˜๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•˜๋‹ค.


๋˜ํ•œ ์ด๋Ÿฐ Z-ordering์— ๊ธฐ๋ฐ˜ํ•œ ํŒŒํ‹ฐ์…˜์€ ์ „์ฒด ๋ฐ์ดํ„ฐ๊ฐ€ 1 TB ์ดํ•˜, ํŒŒํ‹ฐ์…˜ ๋ณ„ ๋ฐ์ดํ„ฐ๊ฐ€ 1 GB ์ดํ•˜๋ผ๋ฉด ๋ณ„๋กœ ์ถ”์ฒœํ•˜์ง€ ์•Š๋Š”๋‹ค๋Š” ๋‚ด์šฉ๋„ Delta ๋ธ”๋กœ๊ทธ ํฌ์ŠคํŠธ์— ๊ธฐ์ˆ ๋˜์–ด ์žˆ๋‹ค.

You should not be partitioning tables under one terabyte in general. You also shouldnโ€™t partition by a column that will have partitions with less than 1 GB of data.

Compare to Hive-style partitioning

Delta์˜ Z-Ordering๊ณผ Hive-style partitioning ๋‘˜๋‹ค ๋น„์Šทํ•œ ์ข…๋ฅ˜์˜ ๋ฐ์ดํ„ฐ๋ฅผ ํ•˜๋‚˜์˜ ํŒŒ์ผ ํ•˜๋‚˜์˜ ํŒŒ์ผ ์ฒญํฌ๋กœ ๋ฌถ๊ธฐ ์œ„ํ•œ ํ…Œํฌ๋‹‰์ด๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ํŠน์ • ์ฟผ๋ฆฌ๋ฅผ ์ˆ˜ํ–‰ํ•  ๋•Œ, ์ „์ฒด ๋ฐ์ดํ„ฐ๊ฐ€ ์•„๋‹Œ ์ผ๋ถ€ ๋ฐ์ดํ„ฐ๋งŒ ์ฝ๊ณ  ๊ฒฐ๊ณผ๋ฅผ ๋ฐ˜ํ™˜ํ•  ์ˆ˜ ์žˆ๋‹ค.

๋‹จ, ์ฐจ์ด์ ์€ ๋ฌผ๋ฆฌ์ ์ธ ๊ตฌ์กฐ์— ์žˆ๋‹ค. Hive-style partitioning์€ ๋น„์Šทํ•œ ์ข…๋ฅ˜์˜ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ™์€ ๋””๋ ‰ํ† ๋ฆฌ(directory)์— ๋ฐฐ์น˜ํ•œ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜, Delta์˜ Z-Ordering์€ ๋น„์Šทํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ๋””๋ ‰ํ† ๋ฆฌ ๋ถ„๋ฆฌ ์—†์ด ํ•˜๋‚˜์˜ ๋””๋ ‰ํ† ๋ฆฌ์— ๋‹ค๋ฅธ ์ข…๋ฅ˜์˜ ๋ฐ์ดํ„ฐ์™€ ํ•จ๊ป˜ ๋ชจ๋‘ ๋ฐฐ์น˜ํ•œ๋‹ค.

Partition ๋ผ๋ฆฌ ์™„๋ฒฝํ•˜๊ฒŒ ๋ถ„๋ฆฌํ•˜๋Š” Hive-style์ด ์–ด๋–ค ๋•Œ๋Š” ๊ฐ•์ ์„ ๊ฐ€์งˆ ์ˆ˜ ์žˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๋””๋ ‰ํ† ๋ฆฌ ๋ถ„๋ฆฌ๋ฅผ ์ฑ…์ž„์ด ๋”ฐ๋ฅด๋Š”๋ฐ, ๋งŒ์•ฝ ํŒŒํ‹ฐ์…˜ ์ปฌ๋Ÿผ์— ๋„ˆ๋ฌด ๋งŽ์€ Distinct ๊ฐ’๋“ค์ด ์žˆ์„ ๋•Œ๋Š” ํŒŒํ‹ฐ์…˜ ๋””๋ ‰ํ† ๋ฆฌ๊ฐ€ ๋„ˆ-๋ฌด ๋งŽ์ด ์ƒ๊ธธ ๊ฒƒ์ด๊ณ , ํŒŒํ‹ฐ์…˜ ์ปฌ๋Ÿผ์„ ํ•œ๋ฒˆ ์ง€์ •ํ•˜๋ฉด ๊ทธ๊ฒƒ์„ ๋ฐ”๊พธ๊ธฐ๋Š” ์ •๋ง ์–ด๋ ต๊ณ , ๋˜ ํŒŒํ‹ฐ์…˜ ์ปฌ๋Ÿผ์„ ์—ฌ๋Ÿฌ ๊ฐœ ์ง€์ •ํ•˜๋ฉด ๊ทธ๋งŒํผ ํŒŒํ‹ฐ์…˜ ๋””๋ ‰ํ† ๋ฆฌ์˜ ๊นŠ์ด(depth)๊ฐ€ ๊นŠ์–ด์ง„๋‹ค.

๋‹จ, Z-Ordering๊ณผ Hive-style Partition์€ ๋ฒ ํƒ€์ ์ธ ์กด์žฌ๊ฐ€ ์•„๋‹ˆ๋‹ค. Delta์—์„œ๋„ Hive-style Partitioning์„ ํ•  ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

Analyze

์•ž์—์„œ Delta๋Š” ์“ฐ๊ธฐ ์ž‘์—…์— ์ž‘๋™์œผ๋กœ ์ปฌ๋Ÿผ์˜ Stats ์ •๋ณด๋ฅผ ์ˆ˜์ง‘ํ•œ๋‹ค๊ณ  ํ–ˆ์—ˆ๋‹ค. ๊ทธ๋Ÿฐ๋ฐ, ์ด๊ฒƒ์„ ์ง์ ‘ ์ฟผ๋ฆฌ๋ฅผ ์‹คํ–‰ํ•ด ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ์œผ๋‹ˆ ๊ทธ๊ฒƒ์ด ANALYZE ๋ช…๋ น์–ด๋‹ค.

-- need DBR 14.x above
ANALYZE TABLE <TABLE_NAME> COMPUTE DELTA STATISTICS

Delta์˜ ์ตœ์‹  ๋ฒ„์ „์ด ๊ฐ€๋ฆฌํ‚ค๋Š” Parquet ํŒŒ์ผ๋“ค์„ ๋ชจ๋‘ ์ฝ์–ด์„œ Stats ์ •๋ณด๋ฅผ ๋‹ค์‹œ ๊ณ„์‚ฐํ•œ๋‹ค. Stats ์ •๋ณด๋ฅผ ๋‹ค์‹œ ๊ณ„์‚ฐํ•˜๋Š” ๊ฒƒ์ด๊ธฐ ๋•Œ๋ฌธ์— _delta_log/์— ์ปค๋ฐ‹๋„ ์ƒˆ๋กœ ์ƒ์„ฑ๋˜๋ฉฐ, COMPUTE STATS๋ผ๋Š” ์—ฐ์‚ฐ์œผ๋กœ ๊ธฐ๋ก๋œ๋‹ค. ๋‹จ, ์ƒˆ๋กœ์šด Parquet ํŒŒ์ผ์ด ์ƒ๊ธฐ๊ฑฐ๋‚˜ ์‚ญ์ œ๋˜๋Š” ๊ฒƒ์€ ์•„๋‹ˆ๋‹ค.


๊ทธ์™ธ์—๋„ ๋ช‡๊ฐ€์ง€ ์˜ต์…˜๊ณผ ํ•จ๊ป˜ ANALYZE๋ฅผ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋Š”๋ฐ,

  • ANALYZE ... COMPUTE STATISTICS NOSCAN
    • Delta ํ…Œ์ด๋ธ”์˜ ์‚ฌ์ด์ฆˆ๋งŒ ์ƒˆ๋กœ ๊ณ„์‚ฐํ•œ๋‹ค.
  • ANALYZE ... COMPUTE STATISTICS FOR COLUMNS ...
    • ์ผ๋ถ€ ์ปฌ๋Ÿผ์— ๋Œ€ํ•ด์„œ Stats ์ •๋ณด ๋‹ค์‹œ ๊ฒŒ์‚ฐ
  • ANALYZE ... COMPUTE STATISTICS FOR ALL COLUMNS
    • ์ „์ฒด ์ปฌ๋Ÿผ์— ๋Œ€ํ•ด์„œ Stats ์ •๋ณด ๋‹ค์‹œ ๊ณ„์‚ฐ

์™œ ํ•„์š”ํ•œ๊ฐ€??

์‚ฌ์‹ค Delta ์“ฐ๊ธฐ ๋•Œ๋งˆ๋‹ค Stats ์ •๋ณด๋ฅผ ์ˆ˜์ง‘ํ•˜๊ณ , ๋˜ OPTIMIZE ZORDER ๋•Œ๋„ Stats ๋ฐ์ดํ„ฐ๋ฅผ ๊ณ„์‚ฐํ•  ํ…๋ฐ ANALYZE ๋ช…๋ น์–ด๊ฐ€ ๊ผญ ํ•„์š”ํ•œ ๊ฑธ๊นŒ?? ์ด ๋ช…๋ น์–ด๋ฅผ ์–ธ์ œ ์‹คํ–‰ํ•ด์ค˜์•ผ ํ•˜๋Š” ๊ฑธ๊นŒ??

์ด๊ฒƒ์ €๊ณณ ์ฐพ์•„๋ณด๋‹ˆ Databricks Community์— ์ด๋Ÿฐ ๋‹ต๋ณ€์ด ์žˆ์—ˆ๋‹ค: โ€œWhatโ€™s the best practice on running ANALYZE on Delta Tables for query performance optimization?โ€

  • ANALYZE whenever the data has changed by about 10%
  • Make sure when you use ANALYZE, you are specifying the COLUMNS or PARTITIONS you want to collect statistics for. Otherwise, as you have noted, it will re-analyze the entire table

์•”ํŠผ ํ…Œ์ด๋ธ”์— ๋ฐ์ดํ„ฐ ๋ณ€ํ™” ์ข€(ex: 10%) ์žˆ์—ˆ๋‹ค๊ฑฐ๋‚˜ Stats ์ •๋ณด ์ˆ˜์ง‘ ์ž์ฒด๋ฅผ ์ˆ˜๋™์œผ๋กœ ์ปจํŠธ๋กค ํ•˜๊ณ  ์‹ถ์„ ๋•Œ, Stats ์ˆ˜์ง‘์„ Delta ์“ฐ๊ธฐ์™€ ๋ณ„๋„๋กœ ์ˆ˜ํ–‰ํ•˜๊ณ  ์‹ถ์€ ์šฉ๋„๋กœ ๋ช…๋ น์–ด๊ฐ€ ๋ถ„๋ฆฌ๋œ๊ฒŒ ์•„๋‹๊นŒ ์‹ถ๋‹ค.

Write Performance Compare

Delta์˜ ๊ฒฝ์šฐ ์“ฐ๊ธฐ ์ž‘์—…์„ ํ•  ๋•Œ๋งˆ๋‹ค Stats ๋ฐ์ดํ„ฐ๋ฅผ ์ˆ˜์ง‘ํ•œ๋‹ค. ์ด๊ฒƒ์€ Parquet ์“ฐ๊ธฐ ์ž‘์—…์—์„œ ์—†๋˜ ์ถ”๊ฐ€์ ์ธ ์ž‘์—…์ด๋‹ค. ์ด๋Ÿฐ Stats๊ฐ€ ๋ช‡๋ช‡ ๋ฐ์ดํ„ฐ ์ฝ๊ธฐ ์ฟผ๋ฆฌ๋ฅผ ํ™•์‹คํžˆ ๋„์›€์ด ๋˜๊ฒ ์ง€๋งŒ, ๊ณผ์—ฐ ๋ฐ์ดํ„ฐ ์“ฐ๊ธฐ ๋•Œ Stats ์ •๋ณด ์ˆ˜์ง‘ ๋•Œ๋ฌธ์— ์ƒ๊ธฐ๋Š” ์˜ค๋ฒ„ํ—ค๋“œ๋กœ ์“ฐ๊ธฐ ํผํฌ๋จผ์Šค๊ฐ€ ๋–จ์–ด์ง€๋Š”๊ฒŒ ์•„๋‹์ง€ ๊ฑฑ์ •์ด ๋˜์—ˆ๋‹ค.

Databricks์—์„œ ์ถœ์‹œํ•œ 2020๋…„์˜ โ€œDelta Lake: High-Performance ACID Table Storage over Cloud Object Storesโ€ ๋…ผ๋ฌธ์— ๋”ฐ๋ฅด๋ฉด, Delta ์“ฐ๊ธฐ์˜ ๊ฒฝ์šฐ Stats ์ˆ˜์ง‘์ด ์žˆ์ง€๋งŒ, ๊ทธ ์˜ค๋ฒ„ํ—ค๋“œ๊ฐ€ Parquet ์“ฐ๊ธฐ์™€ ๋น„๊ตํ•ด ๋ฏธ๋ฏธํ•œ ์ˆ˜์ค€์ด๋ผ๊ณ  ํ•œ๋‹ค. ์•„๋ž˜๋Š” ๋…ผ๋ฌธ์˜ ํ•ด๋‹น ๋ฌธ๋‹จ์˜ ๋‚ด์šฉ์˜ ๋ฐœ์ทŒ๋‹ค.

We also evaluated the performance of loading a large dataset into Delta Lake as opposed to Parquet to test whether Deltaโ€™s statistics collection adds significant overhead. Figure 7 shows the time to load a 400 GB TPC-DS store_sales table, initially formatted as CSV, on a cluster with one i3.2xlarge master and eight i3.2xlarge workers (with results averaged over 3 runs). Sparkโ€™s performance writing to Delta Lake is similar to writing to Parquet, showing that statistics collection does not add a significant overhead over the other data loading work.

๋งบ์Œ๋ง

Delta์˜ Data Skipping ๊ธฐ๋ฒ•์€ SELECT * FROM <TABLE_NAME>์œผ๋กœ ์ „์ฒด ๋ฐ์ดํ„ฐ๋ฅผ ์กฐํšŒํ•œ๋‹ค๋ฉด ๋ฌด์šฉ์ง€๋ฌผ ์ผ ์ˆ˜๋„ ์žˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์„ธ์ƒ์˜ ๋ชจ๋“  ์ฟผ๋ฆฌ๊ฐ€ Full Scan ์ฟผ๋ฆฌ๊ฐ€ ์•„๋‹ ๊ฒƒ์ด๊ณ , ๊ทธ๋Ÿฐ ํŠน์ • ์ฟผ๋ฆฌ์— ๋Œ€ํ•ด์„œ ์–ด๋–ป๊ฒŒ ํผํฌ๋จผ์Šค๋ฅผ ํ–ฅ์ƒ ์‹œํ‚ฌ์ง€ ๊ณ ๋ฏผํ•˜๋Š”๊ฒŒ Delta Lake๋ฅผ ๋„์ž…ํ•œ ์„ธ์ƒ์˜ ๋ฐ์ดํ„ฐ ์—”์ง€๋‹ˆ์–ด๊ฐ€ ํ•ด์•ผ ํ•  ์ผ์ธ ๊ฒƒ ๊ฐ™๋‹ค.

์–ด๋–ค ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ๊นŠ๊ฒŒ ๊ณต๋ถ€ํ•˜๋Š” ๊ฑด, ๊ทธ ํ”„๋ ˆ์ž„์›Œํฌ์— ๋Œ€ํ•œ ์ „๋ฌธ์„ฑ์„ ๊ฐ–์ถ”๋Š” ๊ฒƒ ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๊ทธ๊ฒƒ์ด ์ฑ„ํƒํ•œ ๊ธฐ์ˆ ๋„ ํ•จ๊ป˜ ๊ณต๋ถ€ํ•˜๊ฒŒ ๋˜๋Š” ๊ฒƒ ๊ฐ™๋‹ค. ์ข…์ข… โ€œ๋จ„์•ฝ ๋‚ด๊ฐ€ ๋Œ€๊ทœ๋ชจ ๋ถ„์‚ฐ์ฒ˜๋ฆฌ ์‹œ์Šคํ…œ์„ ๋‹ค์‹œ ์„ค๊ณ„ ํ•œ๋‹ค๋ฉด?โ€ ๊ฐ™์€ ๋ฌผ์Œ์„ ๋˜๋‡Œ์ด๋ฉฐ ๊ทธ๋Ÿฐ ์ˆœ๊ฐ„์ด ์™”์„ ๋•Œ ์–ด๋–ค ํ…Œํฌ๋‹‰๋“ค์„ ์จ์•ผ ํ•˜๋Š”์ง€ ์ตํžˆ๋Š” ๊ณผ์ •์ด๋ผ๊ณ  ์ƒ๊ฐํ•œ๋‹ค. ๋˜, ์–ด๋–ค ๊ธฐ์ˆ ์ ์ธ ์„ธ๋ถ€์‚ฌํ•ญ ๋ณด๋‹ค๋Š” ๊ทธ ํ…Œํฌ๋‹‰์ด ๋ฌธ์ œ๋ฅผ ํ•ฉ๋ฆฌ์ ์œผ๋กœ ์ ‘๊ทผํ•˜๊ณ , ์„ค๊ณ„๋˜์—ˆ๋Š”์ง€๋ฅผ ๊ณ ๋ฏผํ•˜๋Š” ๊ฒƒ ์ž์ฒด๊ฐ€ ๊ฐ€์น˜ ์žˆ๋Š” ์ˆœ๊ฐ„๋“ค์ธ ๊ฒƒ ๊ฐ™๋‹ค.

Reference