5 minute read

2025๋…„ ์ €๋Š” ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ๋žฉ์— ์ปจํƒํ•˜์—ฌ ํ•™๋ถ€ ์กธ์—… ์—ฐ๊ตฌ ์ฃผ์ œ๋ฅผ ๋ฐ›์•„์„œ ์ง„ํ–‰ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์ €์˜ ์ฃผ์ œ๋Š” โ€œContinuous Subgraph Matchingโ€๊ณผ ๊ด€๋ จํ•ด ์ฝ”๋“œ๋ฅผ ์ตœ์ ํ™” ํ•˜๊ณ  ๊ฐœ์„ ํ•˜๋Š” ๊ฒƒ์œผ๋กœ ๊ทธ๋ž˜ํ”„ ์ฟผ๋ฆฌ ๊ด€๋ จ ๋…ผ๋ฌธ์„ ์ฝ๊ณ , C++ ์ฝ”๋“œ๋ฅผ ํŠœ๋‹ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์กธ์—… ๋งˆ์ง€๋ง‰ ํ•™๊ธฐ์— ๋“ฃ๋Š” ์ˆ˜์—…์ธ๋ฐ ๋งŽ์€ ๋…ธํ•˜์šฐ์™€ ๊ฒฝํ—˜์„ ์Œ“๊ณ  ์กธ์—…ํ•˜๊ธฐ๋ฅผ ๊ธฐ๋Œ€ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค ใ…Žใ…Ž

๋“ค์–ด๊ฐ€๋ฉฐ

์กธ์—…์—ฐ๊ตฌ๋ฅผ ํ•˜๋ฉด์„œ ๊ณ๋‹ค๋ฆฌ๋กœ ์ฐพ์•„๋ณธ ๋‚ด์šฉ๋“ค, ๊ทธ๋ฆฌ๊ณ  ๋…ผ๋ฌธ์—์„œ ์ž ๊น ์–ธ๊ธ‰๋œ ๋‚ด์šฉ๋“ค์„ ๋ชจ์•„์„œ ์ด ํฌ์ŠคํŠธ์— ๋ชจ์•„๋‘ก๋‹ˆ๋‹ค.

Sideways Information Passing

SQL์—์„œ ์„ฑ๋Šฅ ์ตœ์ ํ™”๋ฅผ ์œ„ํ•ด ์ˆ˜ํ–‰ํ•˜๋Š” ํ…Œํฌ๋‹‰ ์ž…๋‹ˆ๋‹ค.

์˜ˆ๋ฅผ ๋“ค์–ด, Orders(order_id, customer_id, data) ํ…Œ์ด๋ธ”๊ณผ Customers(customer_id, name, country) ํ…Œ์ด๋ธ”์ด ์žˆ๊ณ , ์•„๋ž˜์™€ ๊ฐ™์€ ์ฟผ๋ฆฌ๋ฅผ ์ž‘์„ฑํ•˜์˜€์Šต๋‹ˆ๋‹ค.

SELECT *
FROM Orders O
JOIN Customers C ON O.customer_id = C.customer_id
WHERE C.country = 'Korea';

์œ„์˜ ์ฟผ๋ฆฌ๋ฅผ ํ…Œ์ด๋ธ” Join์„ ๋จผ์ € ์ˆ˜ํ–‰ํ•˜๊ณ , ์กฐ์ธ ๊ฒฐ๊ณผ์— ๋Œ€ํ•ด C.country = 'Korea' ์กฐ๊ฑด์„ ์ ์šฉ ํ•ฉ๋‹ˆ๋‹ค.

SIP ํ…Œํฌ๋‹‰์€ WHERE์ ˆ์„ ์กฐ์ธ ๋Œ€์ƒ์ธ Customers ํ…Œ์ด๋ธ”์— ๋จผ์ € ์ ์šฉํ•˜๊ณ , ํ…Œ์ด๋ธ” ์กฐ์ธ์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•ฉ๋‹ˆ๋‹ค.

์ด๋ ‡๊ฒŒ ํ•˜๋ฉด ํ…Œ์ด๋ธ” ์กฐ์ธ์—์„œ $N \times M$์ด ์•„๋‹ˆ๋ผ $N \times (M - \alpha)$๊ฐ€ ๋˜๊ธฐ ๋•Œ๋ฌธ์— ๋” ์ ์€ ์กฐ์ธ ์—ฐ์‚ฐ์œผ๋กœ ๊ฐ™์€ ๊ฒฐ๊ณผ๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค!


SIP๋Š” ์ž๋™์œผ๋กœ ๋˜๋Š”๊ฒจ? ์•„๋‹˜ ์‚ฌ๋žŒ์ด ํ•ด์•ผ ํ•˜๋Š”๊ฒจ?

์ž๋™์œผ๋กœ ๋˜๋Š” ๊ฑฐ๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค! ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์˜ ์˜ตํ‹ฐ๋งˆ์ด์ €๋‚˜ โ€œ๋‚ด๋ถ€์ ์œผ๋กœโ€ ์ˆ˜ํ–‰ํ•˜๋Š” ์ฟผ๋ฆฌ ์ตœ์ ํ™” ๊ธฐ๋ฒ•์ด๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ๋‹ค๋งŒ, ์‚ฌ์šฉ์ž๋„ ์ฟผ๋ฆฌ ์˜ตํ‹ฐ๋งˆ์ด์ €๊ฐ€ ๊ตฌ์กฐ๋ฅผ ์ž˜ ์ธ์‹ํ•  ์ˆ˜ ์žˆ๋„๋ก ์ฟผ๋ฆฌ ์ž์ฒด๋ฅผ ์ž˜ ์ž‘์„ฑ ํ•ด์ค˜์•ผ ํ•ฉ๋‹ˆ๋‹ค.

์‹ ๊ธฐํ•˜๊ฒŒ๋„ Apache Spark์—๋„ SIP ์ตœ์ ํ™”๊ฐ€ ์ ์šฉ ๋˜์–ด ์žˆ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค! Spark์˜ โ€œCatalyst Optimizerโ€๋Š” SIP๋ฅผ ํฌํ•จํ•ด ๋‹ค์–‘ํ•œ ๋…ผ๋ฆฌ/๋ฌผ๋ฆฌ ์ฟผ๋ฆฌ ์ตœ์ ํ™”๋ฅผ ์ˆ˜ํ–‰ํ•ด์ค๋‹ˆ๋‹ค.

vs. Predicate Pushdown

SIP์„ ์•Œ์•„๋ณด๋ฉด์„œ ํ—ท๊ฐˆ๋ ธ๋˜๊ฒŒ, Predicate Pushdown ์ตœ์ ํ™” ์˜€์Šต๋‹ˆ๋‹ค.

Predicate Pushdown์€ ์•„๋ž˜์™€ ๊ฐ™์€ ์ฟผ๋ฆฌ์—์„œ ์ˆ˜ํ–‰ ๋˜๋Š”๋ฐ,

SELECT *
FROM Customers
WHERE country = 'Korea';

์ฟผ๋ฆฌ๋ฅผ ๋ฐ์ดํ„ฐ๊ฐ€ ํŒŒ์ผ/ํ…Œ์ด๋ธ” ์Šค์บ” ํ•˜๋Š” ๊ณผ์ •์—์„œ WHERE ์กฐ๊ฑด์„ ๋จผ์ € ์ ์šฉํ•ฉ๋‹ˆ๋‹ค. ์ด๊ฒƒ์€ ๋ฉ”๋ชจ๋ฆฌ์— ํ•„์š”ํ•œ ํ–‰๋งŒ ๋กœ๋“œํ•ด์„œ ๋ฆฌ์†Œ์Šค๋ฅผ ์•„๋‚„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค!

๋ฐ˜๋ฉด์—, SIP๋Š” ์กฐ์ธ ๊ณผ์ •์—์„œ ์ด๋ค„์ง€๋Š” ์ตœ์ ํ™” ์ž…๋‹ˆ๋‹ค. ์กฐ์ธ ๊ฒฐ๊ณผ์— WHERE์„ ์ˆ˜ํ–‰ํ•˜์ง€ ์•Š๊ณ , ์†Œ์Šค ํ…Œ์ด๋ธ”์— WHERE์„ ๋„˜๊ฒจ๋ฒ„๋ฆฌ๋Š” ๊ฒƒ์ด SIP์˜ ์ปจ์…‰ ์ž…๋‹ˆ๋‹ค.

๋‘ ์ตœ์ ํ™” ๊ธฐ๋ฒ•์€ ์„œ๋กœ ์ƒ์ถฉ ๋˜๋Š” ๊ฒƒ์ด ์•„๋‹™๋‹ˆ๋‹ค. ๊ทธ๋ž˜์„œ SIP๊ฐ€ ์ ์šฉ๋œ ํ›„์— ์†Œ์Šค ํ…Œ์ด๋ธ”์— ๋Œ€ํ•ด์„œ Predicate Pushdown์ด ์ผ์–ด๋‚˜๋Š” ๊ฒƒ๋„ ๊ฐ€๋Šฅ ํ•ฉ๋‹ˆ๋‹ค.

Database Algebra

๋†€๋ž๊ฒŒ๋„โ€ฆ ์ œ๊ฐ€ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ์ˆ˜์—…์„ ์•ˆ ๋“ฃ๊ณ  ์กธ์—…์„ ํ•ฉ๋‹ˆ๋‹คโ€ฆ ใ…‹ใ…‹ ๊ทธ๋ž˜์„œ ๋…ผ๋ฌธ์˜ ํ‘œ๊ธฐ ์ค‘์— ์ฒจ๋ณด๋Š” ๊ฒƒ๋“ค์ด ์žˆ์–ด์„œ ์ •๋ฆฌํ•ฉ๋‹ˆ๋‹ค ใ…‹ใ…‹ (์ •๋ณด์ฒ˜๋ฆฌ์‚ฐ์—…๊ธฐ์‚ฌ ์ค€๋น„ํ•  ๋•Œ ๋ณด๊ธด ํ–ˆ์—ˆ๋„ค์š”)

Projection $\pi$

SQL์˜ SELECT ์—ฐ์‚ฐ ๊ฐ™์€ ๊ฒƒ

Joins

  • Theta Join $\Join_{\theta}$
    • Join ์กฐ๊ฑด์„ ON ์ ˆ์— ์ง์ ‘ ๋ช…์‹œํ•˜๋Š” ์กฐ์ธ ์ž…๋‹ˆ๋‹ค.
  • Natural Join $\Join$
    • ๋‘ ์†Œ์Šค ํ…Œ์ด๋ธ”์— ์žˆ๋Š” ๊ณตํ†ต ์ด๋ฆ„&ํƒ€์ž… ์ปฌ๋Ÿผ์„ ๊ธฐ์ค€์œผ๋กœ ์•Œ์•„์„œ ์กฐ์ธ์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

MVCC, Multi-version Concurrency Control

๋ฐ์ดํ„ฐ๋ฒ ์ž‰์Šค๋‚˜ ๋™์  ์‹œ์Šคํ…œ์—์„œ โ€œ๋™์‹œ์„ฑโ€์„ ์ œ์–ดํ•˜๊ธฐ ์œ„ํ•ด์„œ ์—ฌ๋Ÿฌ ๋ฒ„์ „์˜ ๋ฐ์ดํ„ฐ๋ฅผ ๋™์‹œ์— ์œ ์ง€ํ•˜๋Š” ํ…Œํฌ๋‹‰ ์ž…๋‹ˆ๋‹ค.

TurboFlux(2018) ๋…ผ๋ฌธ์—์„œ CSM ๋ฌธ์ œ๋ฅผ ๋ถ„์‚ฐ์ฒ˜๋ฆฌ ํ™˜๊ฒฝ์—์„œ ๋‹ค๋ฃจ๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด ์–ธ๊ธ‰ํ•  ๋•Œ, ์ž ๊น ๋‚˜์™”์Šต๋‹ˆ๋‹ค.

ํšŒ์‚ฌ์—์„œ Delta-lake ํฌ๋งท์„ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ๋Š”๋ฐ, ์ด๊ฒŒ ๋Œ€ํ‘œ์ ์ธ MVCC ํ”„๋ ˆ์ž„์›Œํฌ๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

Dataset

์กธ์—… ํ”„๋กœ์ ํŠธ ์‹คํ—˜์— ์‚ฌ์šฉํ•œ ๋ฐ์ดํ„ฐ์…‹์˜ ๋ชฉ๋ก ์ž…๋‹ˆ๋‹ค.

  • LSBench
    • Undirected, Labeled
    • Directed, Labeled
  • Livejournal
    • Undirected, Unlabeled
  • Netflow
    • Directed, Labeled

Labeled vs. Unlabeled

Labeled์™€ Unlabeled์˜ ์ฐจ์ด๋Š” Edge์— ๋ผ๋ฒจ์ด ์žˆ๋Š”์ง€ ์—ฌ๋ถ€ ์ž…๋‹ˆ๋‹ค.

@lsbench (labeled)
v 0 0
v 1 0
v 2 0
v 3 0
v 4 0
v 5 0
e 0 3 33
e 1 2 42
e 2 3 42
e 3 4 41
e 3 5 32

LSBench๋Š” ์—ฃ์ง€์— ๋ผ๋ฒจ์ด ๋ถ€์—ฌ ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค: e v1, v2 label

@livejournal (unlabeled)
v 0 26
v 1 8
v 2 9
v 3 26
v 4 8
v 5 14
e 0 2 0
e 0 3 0
e 0 5 0
e 1 3 0
e 3 4 0

livejournal์€ ์—ฃ์ง€์—๋Š” ๋ผ๋ฒจ์ด ์—†์ง€๋งŒ, ๋…ธ๋“œ์—๋Š” ๋ผ๋ฒจ์ด ์กด์žฌํ•ฉ๋‹ˆ๋‹ค.