๋ฐœ๋กœ๋ž€ํŠธ, Marvel Sanp์˜ ์•„ํ‚คํ…์ฒ˜์™€ AWS Serverless ํŠธ๋ Œ๋“œ, K8s Pod/Node Scaling

12 minute read

์ž‘๋…„์— ์ด์–ด ์˜ฌํ•ด๋„ โ€œGames on AWSโ€์— ๋‹ค๋…€์™”๋‹ค. ๋Š˜ ์‚ผ์„ฑ ์ฝ”์—‘์Šค์—์„œ ์ง„ํ–‰ํ–ˆ์—ˆ๋Š”๋ฐ, ์ด๋ฒˆ์—๋Š” ํŒ๊ต์—์„œ ์ง„ํ–‰ํ–ˆ๋‹ค!

์˜ค๋žœ๋งŒ์— ์•„์นจ ์ผ์ฐ ์ง€ํ•˜์ฒ  ํƒ€๊ณ  ์ด๋™ํ•˜๋ ค๋‹ˆ ๋„ˆ๋ฌด ํ”ผ๊ณคํ–ˆ๋‹คโ€ฆ ใ… ใ… 

์„ธ์…˜์€ ์ „๋ฐ˜์ ์œผ๋กœ Game์— ๊ด€๋ จ๋œ ์ฃผ์ œ๋“ค์ธ๋ฐ, ํ์Œ ์˜์™ธ๋กœ ์ž‘๋…„ GoW ๋ณด๋‹ค ๋“ค์„๊ฒŒ ์—†๋Š” ๋Š๋‚Œ์ด์—ˆ๋‹คโ€ฆ;;

๊ทธ๋ž˜๋„ ์ฐธ์„ ํ–ˆ๋˜ ์„ธ์…˜๋“ค์„ ์žฌ๋ฐŒ๊ฒŒ ๋“ค์—ˆ๊ธฐ์— ์„ธ์…˜ ๋‚ด์šฉ๊ณผ ์ƒ๊ฐ๋“ค์„ ์ •๋ฆฌํ•ด๋ณด๊ฒ ๋‹ค.


ํ‚ค๋…ธํŠธ ์„ธ์…˜

Marvel Snap

๊ฒŒ์ž„ โ€œMarvel Snapโ€์˜ ๊ฐœ๋ฐœ์‚ฌ์ธ โ€œSecond Dinnerโ€์—์„œ ์ฐธ์„ํ–ˆ๋‹ค! ๋งˆ๋ธ” IP๋กœ ๋งŒ๋“  TCG ํ˜•์‹์˜ ๊ฒŒ์ž„์ธ๋ฐ, ์ง€ํ•˜์ฒ  ํƒ€๊ณ  ์ถœ๊ทผํ•˜๋˜ ์‹œ์ ˆ์—” ์ถœ๊ทผ๊ธธ์— ์‚ฌ๋žŒ์ด ์š” ๊ฒŒ์ž„ ํ•˜๋˜๊ฑธ ์ •๋ง ๋งŽ์ด ๋ดค์—ˆ๋‹ค ใ…‹ใ…‹ใ…‹

ํ‚ค๋…ธํŠธ์—์„  โ€œSecond Dinnerโ€ ๊ฐœ๋ฐœ์‚ฌ์™€ โ€œMarve Snapโ€์— ๋Œ€ํ•œ ์„ค๋ช…๊ณผ ๊ฒŒ์ž„ ๊ฐœ๋ฐœ ์‚ฌ์ดํด๊ณผ AWS ์ข‹์•„์š”~ ์ •๋„์˜ ์–˜๊ธฐ๋งŒ ํ–ˆ์—ˆ๋‹ค. ํฅ๋ฏธ๊ฐ€ ์ƒ๊ฒจ์„œ ์ดํ›„์— ์˜คํ›„ ์„ธ์…˜์— ์ฐธ์„๋„ ํ–ˆ๋Š”๋ฐ, ๊ทธ ์„ธ์…˜์ด ์ •๋ง ์žฌ๋ฐŒ์—ˆ๋‹ค ใ…Žใ…Ž

Riot Games

๋˜ ๋†€๋ž๊ฒŒ๋„! Riot Games์—์„œ๋„ GoW ํ‚ค๋…ธํŠธ ๋ฐœํ‘œ๋ฅผ ํ–ˆ๋‹ค! LoL๋กœ ์œ ๋ช…ํ•œ Riot Games์ด์ง€๋งŒ, ์„ธ์…˜์—์„œ ์žฌ๋ฐŒ๊ฒŒ ๋“ค์—ˆ๋˜ ๋ถ€๋ถ„์€ ๋ฐœ๋กœ๋ž€ํŠธ(Valorant) ๊ฐœ๋ฐœ ์ด์•ผ๊ธฐ ์˜€๋‹ค.

โ€œPeekerโ€™s Advantagesโ€๋ž€ ๋‚ด์šฉ์— ๋Œ€ํ•ด ์„ค๋ช…ํ–ˆ๋Š”๋ฐ ์š”๊ฒŒ ์ •๋ง ์žฌ๋ฐŒ์—ˆ๋‹ค. FPS ์žฅ๋ฅด์—์„œ ๋ฐœ์ƒํ•˜๋Š” ํ˜„์ƒ์ธ๋ฐ, ๊ทธ ์ด์œ ๋Š” ์„œ๋ฒ„์™€ ์ƒํƒœ๋ฅผ ์‹ฑํฌํ•˜๋Š” ๊ณผ์ •์—์„œ ๋ฐœ์ƒํ•˜๋Š” Latency ๋•Œ๋ฌธ์ด๋‹ค.

peeker๊ฐ€ ์ด๋™ํ–ˆ๋‹ค๋Š” ์ •๋ณด๊ฐ€ ์„œ๋ฒ„์— ์ €์žฅ๋œ ํ›„์— holder๊ฐ€ ์„œ๋ฒ„๋กœ๋ถ€ํ„ฐ peeker์˜ ์œ„์น˜ ์ •๋ณด๋ฅผ ๋ฐ›์•„์˜ค๊ธฐ ๋•Œ๋ฌธ์— latency๊ฐ€ ์–ด์ฉ” ์ˆ˜ ์—†์ด ๋ฐœ์ƒํ•œ๋‹ค. ๋‹จ, ๋ฐœ๋กœ๋ž€ํŠธ์—์„œ ์ด๋Ÿฐ peeker latency๋Š” ํ‰๊ท  40~70 ms ์ˆ˜์ค€์ด๋ผ๊ณ  ํ•˜๋ฉฐ, ์ธ๊ฐ„์˜ ํ‰๊ท  ๋ฐ˜์‘ ์†๋„๊ฐ€ 240 ms ์ˆ˜์ค€์ด๊ธฐ ๋•Œ๋ฌธ์— Peekerโ€™s Advantage๋กœ ์–ป๋Š” ํšจ๊ณผ๋ฅผ ๊ทธ๋ฆฌ ํฌ์ง€ ์•Š์„ ์ˆ˜ ์žˆ๋‹ค ใ…‹ใ……ใ…‹

๊ทธ์™ธ

๋„ค์˜ค์œ„์ฆˆ์™€ NC์˜ ๋ฐœํ‘œ๊ฐ€ ์ด์–ด์กŒ๋Š”๋ฐ, ๋„ค์˜ค์œ„์ฆˆ์˜ ๊ฒฝ์šฐ๋Š” ๊ทธ๋™์•ˆ ๋ช‡ ๋…„์— ๊ฑธ์ณ ์˜จํ”„๋ ˆ๋ฏธ์Šค ์ธํ”„๋ผ์—์„œ AWS ์ธํ”„๋ผ๋กœ ์ด์ „ ํ–ˆ๋‹ค๋Š” ๊ฒฝํ—˜๋‹ด์ด์—ˆ๊ณ , NC๋Š” ์ž์ฒด ๊ฐœ๋ฐœํ•œ LLM์ธ โ€œVarco LLMโ€์— ๋Œ€ํ•œ ํ™๋ณด๊ฐ€ ๊ธฐ์–ต์— ๋‚จ๋Š”๋‹ค. NC ๊ฐœ๋ฐœ LLM์€ AWS Bedrock ํ†ตํ•ด์„œ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค๊ณ  ํ•œ ๊ฒƒ ๊ฐ™์€๋ฐ, ์ด๋ฒˆ์— AWS Bedrock์„ ์ฒ˜์Œ ๋ด์„œ ํ•œ๋ฒˆ ์จ๋ณด๊ณ  ์‹ถ์–ด์กŒ๋‹ค ใ…Žใ…Ž

AWS ๊ธฐ์ˆ  ๋ธ”๋กœ๊ทธ์—์„  NC Varco LLM์œผ๋กœ ํ•œ๊ตญ์–ด chatbot ๋งŒ๋“œ๋Š” ์˜ˆ์ œ๋„ ์žˆ๋‹ค ใ…Žใ„ทใ„ท


Marvel Snap - AWS ์„œ๋ฒ„๋ฆฌ์Šค ๋งŒ์œผ๋กœ ์˜ฌํ•ด์˜ ๋ชจ๋ฐ”์ผ ๊ฒŒ์ž„ ๋งŒ๋“ค๊ธฐ

์ง€ํ•˜์ฒ ์—์„œ ๋งŽ์ด ๋ดค๋˜ Marvel Snap ใ…Žใ…Ž ํšŒ์‚ฌ์—๋„ ์š” ๊ฒŒ์ž„ ํ•˜๋Š” ์‚ฌ๋žŒ์ด ์žˆ์–ด์„œ ๊ตฌ๊ฒฝ ํ–ˆ์—ˆ๋Š”๋ฐ, ์ •๋ง ์ž˜ ๋งŒ๋“  ๊ฒŒ์ž„์ด์—ˆ๋‹ค!! ํ•˜์Šค์Šคํ†ค ๊ฐœ๋ฐœ์ž ์ถœ์‹ ์ด ๋งŒ๋“ค์—ˆ๋‹ค๋Š”๋ฐ ์™€โ€ฆ ํ•˜์Šค๋ฅผ ์ด๋ ‡๊ฒŒ ๋งŒ๋“ค์–ด์ฃผ์ง€โ€ฆ ใ… ใ… 

์ œ์ผ ํฅ๋ฏธ๋กœ์› ๋˜ ๊ฑด, ๊ฒŒ์ž„ ์šด์˜์„ ์ฒ˜์Œ๋ถ€ํ„ฐ ๋๊นŒ์ง€ โ€œServerlessโ€ ํ•˜๋‚˜๋กœ ๋ชจ๋‘ ํ•˜๊ณ  ์žˆ๋‹ค๋Š” ๊ฑฐ๋‹ค!! Serverless๊ฐ€ ์žฅ์ ์ด ์žˆ๋‹ค๊ณ  ํ•ด๋„ ๋ถ„๋ช… ํ•œ๊ณ„๊ฐ€ ์กด์žฌํ•  ํ„ฐ์ธ๋ฐโ€ฆ hmmโ€ฆ

Provisioned Concurrency

Marvel Snap์—์„œ๋Š” ์ค‘์š”ํ•œ ๊ธฐ๋Šฅ์— ๋Œ€ํ•ด์„  โ€œProvisioned Concurrencyโ€ ๊ธฐ๋Šฅ์„ ์ ์šฉํ•ด Serverless๋ผ๋„ ๋น ๋ฅธ ์‘๋‹ต์„ ์ œ๊ณตํ•œ๋‹ค๊ณ  ํ•œ๋‹ค. ์ฆ‰, Lambda๊ฐ€ ๊ฐ€์ง„ โ€œCold Startโ€๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค.

๋ฌผ๋ก  Lambda Container๋ฅผ ๋Œ€์‹œ ์‹œ์ผœ๋‘๊ธฐ ๋•Œ๋ฌธ์— ๊ธฐ๋ณธ ๋ชจ๋“œ ๋ณด๋‹ค ๋น„์šฉ์€ ๋” ๋‚˜๊ฐ„๋‹ค.

Lambda Optimization: Static Initialization

๋˜, Serverless์˜ ๊ฒฝ์šฐ ๋‚ด๋ถ€์ ์œผ๋กœ ์ดˆ๊ธฐ ํ˜ธ์ถœ ํ›„์—” Micro Container?๊ฐ€ ์‚ด์•„์žˆ์–ด ๊ทธํ›„ ํ˜ธ์ถœ๋ถ€ํ„ฐ๋Š” ์‘๋‹ต ์‹œ๊ฐ„์ด ์•„์ฃผ ์ค„์–ด๋“ ๋‹ค๊ณ  ํ•œ๋‹ค. ๊ทธ๋ž˜์„œ ๋žŒ๋‹ค ์ฝ”๋“œ๋ฅผ ์–ด๋–ป๊ฒŒ ๊ตฌ์„ฑํ•˜๋Š๋ƒ์— ๋”ฐ๋ผ์„œ ๊ทธํ›„ ์‘๋‹ต ์‹œ๊ฐ„์ด ๋‹ฌ๋ผ์ง„๋‹ค.

์˜ˆ๋ฅผ ๋“ค์–ด, ์•„๋ž˜ ๋‘ ์ฝ”๋“œ๋Š” 2nd request์—์„œ์˜ ์‹คํ–‰ ์‹œ๊ฐ„์ด ๋‹ค๋ฅธ๋‹ค.

# 1st code
def returnUsersVer1():
  db = database.connect(...)
  return db.getUsers()
# 2nd code
db = database.connect(...)

def returnUsersVer2():
  return db.getUsers()

๋‘ ์ฝ”๋“œ์˜ ์ฐจ์ด๋Š” db๋ผ๋Š” ๋ณ€์ˆ˜๋ฅผ ํ•จ์ˆ˜ ์•„๋ž˜์˜ local ๋ณ€์ˆ˜๋กœ ๋‘๋Š๋ƒ ์•„๋‹˜ global ๋ณ€์ˆ˜๋กœ ๋‘๋Š๋ƒ์ด๋‹ค. AWS Lambda์—์„  ๊ฐ™์€ execution environment๋ฅผ ๊ฐ–๋Š” ๋‘ lambda invocation ์‚ฌ์ด์—์„  global ๋ณ€์ˆ˜์˜ ๊ฐ’์„ ์œ ์ง€ํ•˜๊ธฐ ๋•Œ๋ฌธ์—, ์ฒซ ์š”์ฒญ์—์„  ๋‘ ํ•จ์ˆ˜์˜ ์‹คํ–‰์‹œ๊ฐ„์ด ๊ฐ™์•„๋„, ์ฃผ๋ฒˆ์งธ ์š”์ฒญ๋ถ€ํ„ฐ๋Š” db ๋ณ€์ˆ˜๋ฅผ global ๋ณ€์ˆ˜๋กœ ์ง€์ •ํ•œ Lambda ์ฝ”๋“œ๊ฐ€ ๋” ๋น ๋ฅด๋‹ค!

AWS GameLift FlexMatch: Serverless Matching

TCG ๊ฒŒ์ž„์˜ ๊ฒฝ์šฐ, ์ƒ๋Œ€์™€ ๋งž๋ถ™๊ธฐ ๋•Œ๋ฌธ์— ๋งค์นญ์„ ํ•  ๋•Œ ์–ด๋–ค ์ƒ๋Œ€์™€ ๋งค์นญ ์‹œ์ผœ์ฃผ๋Š๋ƒ๊ฐ€ ์ค‘์š”ํ•˜๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ์ดˆ๋ณด์ž์™€ ๊ณ ์ˆ˜ ์œ ์ €๊ฐ€ ๋งค์นญ ๋œ๋‹ค๋ฉด, ์ดˆ๋ณด ์œ ์ €๋Š” ์ดˆ๋ฐ˜๋ถ€ํ„ฐ ๊ฒŒ์ž„์— ๋Œ€ํ•œ ์˜์š•์„ ์žƒ์–ด๋ฒ„๋ฆด ๊ฒƒ์ด๋‹ค.

์ „ํ†ต์ ์ธ ๋งค์นญ ๊ฒŒ์ž„์—์„œ ๋งค์นญ๋งŒ์„ ์ „๋ฌธ์ ์œผ๋กœ ํ•˜๋Š” ์„œ๋ฒ„, ๋งˆ์ดํฌ๋กœ ์„œ๋น„์Šค ๋˜๋Š” ์ƒ์šฉ ๋งค์นญ ์„œ๋น„์Šค๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค. ๊ทธ๋Ÿฐ๋ฐ, Marvel Snap์—์„œ๋Š” ๋งค์นญ ์กฐ์ฐจ Serverless ์„œ๋น„์Šค๋กœ ํ•ด๊ฒฐํ–ˆ๋‹ค!!

AWS GameLift์˜ FlexMatch์—์„  ๋งค์นญ์— ํ•„์š”ํ•œ Configuration ์„ค์ •๋งŒ์œผ๋กœ ์ •๊ตํ•œ ๋งค์นญ์„ ์ œ๊ณตํ•  ์ˆ˜ ์žˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ์•„๋ž˜์™€ ๊ฐ™์ด ๊ทœ์น™์…‹์„ ์ง€์ •ํ•  ์ˆ˜ ์žˆ๋‹ค.

// ์ถœ์ฒ˜: https://docs.aws.amazon.com/ko_kr/gamelift/latest/flexmatchguide/match-examples.html
"rules": [{
    "name": "FairTeamSkill",
    "description": "The average skill of players in each team is within 10 points from the average skill of all players in the match",
    "type": "distance",
    // get skill values for players in each team and average separately to produce list of two numbers
    "measurements": [ "avg(teams[*].players.attributes[skill])" ],
    // get skill values for players in each team, flatten into a single list, and average to produce an overall average
    "referenceValue": "avg(flatten(teams[*].players.attributes[skill]))",
    "maxDistance": 10 // minDistance would achieve the opposite result
}, {
    "name": "EqualTeamSizes",
    "description": "Only launch a game when the number of players in each team matches, e.g. 4v4, 5v5, 6v6, 7v7, 8v8",
    "type": "comparison",
    "measurements": [ "count(teams[cowboys].players)" ],
    "referenceValue": "count(teams[aliens].players)",
    "operation": "=" // other operations: !=, <, <=, >, >=
}],

๊ทœ์น™์„ ๋ณด๋ฉด, measurements, referenceValue์— SQL ๊ตฌ๋ฌธ์ด ์žˆ์–ด ์‚ฌ์šฉ์ž ์ƒํƒœ ์ •๋ณด์— ๋”ฐ๋ผ ์ ์ ˆํ•œ ๋งค์นญ ๊ฒฝํ—˜์„ ์ œ๊ณตํ•ด์ค„ ์ˆ˜ ์žˆ๋‹ค!

์ดํ‰

Marvel Snap์˜ Serverless ์•„ํ‚คํ…์ฒ˜๋Š” AWS SA๊ฐ€ ๊ทน์ฐฌํ•  ์ •๋„์˜ ์•„ํ‚คํ…์ณ ์˜€๋‹ค. ์–ด์ฉŒ๋ฉด ์ดˆ๊ธฐ ๊ฒŒ์ž„, ๋ณ€๋™์„ฑ์ด ๊ฐ•ํ•œ ๊ฒŒ์ž„์„ ๊ธฐํšํ•˜๊ณ  ์žˆ๋‹ค๋ฉด, ๊ธฐ์กด์˜ on-demand ๋ชจ๋ธ๋ณด๋‹ค๋„ Sererless ๋ชจ๋ธ์ด ๋” ์ ํ•ฉํ•  ๊ฒƒ ๊ฐ™๋‹ค๋Š” ์ƒ๊ฐ์„ ํ•˜๊ฒŒ ๋  ์ •๋„์˜€๋‹ค.

์„ธ์…˜์—์„œ ๋งํ•˜๊ธธ ํด๋ผ์šฐ๋“œ ์„œ๋น„์Šค๊ฐ€ ์ฒ˜์Œ ๋“ฑ์žฅ ํ–ˆ์„ ๋•Œ, ๊ฐ€์žฅ ๋จผ์ € ์ธํ”„๋ผ๋ฅผ ์˜จํ”„๋ ˆ๋ฏธ์Šค์—์„œ ํด๋ผ์šฐ๋“œ๋ฅผ ์ฑ„ํƒํ•œ ์—…๊ณ„๊ฐ€ ๊ฒŒ์ž„ ๋ถ„์•ผ์˜€๋‹ค๊ณ  ํ•œ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ Serverless์— ๋Œ€ํ•ด์„œ๋Š” ์˜คํžˆ๋ ค ๊ฒŒ์ž„ ์—†๊ณ„๊ฐ€ ์ง„์ „์ด ๋Š๋ฆฌ๊ณ , ์ตœ๊ทผ์— ํด๋ผ์šฐ๋“œ ์ธํ”„๋ผ๋ฅผ ๊ตฌ์ถ•ํ•˜๋Š” ์‹ ๊ทœ ์—…๊ณ„๊ฐ€ ๋” Serverless ๋ชจ๋ธ์„ ์ฑ„ํƒํ•˜๋Š” ์ถ”์„ธ๋ผ๊ณ  ํ•œ๋‹ค.

์ด๋ฒˆ Marvel Snap์˜ ์„ธ์…˜์œผ๋กœ Serverless์— ๋Œ€ํ•œ ๋‚ด ์ธ์‹์ด ํฌ๊ฒŒ ๋ฐ”๋€ ๊ฒƒ ๊ฐ™๋‹ค. ์–ด์ฉŒ๋ฉด ๋‹ค์Œ ์‚ฌ๋‚ด GameJam์—์„œ๋Š” Serverless ๋ชจ๋ธ์„ ์ฑ„ํƒํ•˜๊ฒŒ ๋  ๊ฒƒ ๊ฐ™๋‹ค ใ…Žใ…Ž


Riot Games์—์„œ์˜ EKS ์‚ฌ์šฉ๋ฒ•

์š”์ฆ˜์˜ ๊ฐ€์žฅ ํ•ซํ•œ ๊ฒŒ์ž„ํšŒ์‚ฌ Riot์˜ EKS ์‚ฌ์šฉ๋ฒ•์— ๋Œ€ํ•œ ๊ฐ•์˜์˜€๋‹ค. Riot๋ผ๋Š” ์ด๋ฆ„ ๋•Œ๋ฌธ์ธ์ง€ ๊ฐ€์žฅ ํฐ ๊ฐ•์—ฐ์žฅ์—์„œ ํ–ˆ์Œ์—๋„ ์‚ฌ๋žŒ์ด ๊ฝ‰ ์ฐผ๋‹ค!!

์—ฌ๋Ÿฌ ๋‚ด์šฉ์ด ์žˆ์—ˆ๋˜ ๊ฒƒ ๊ฐ™์€๋ฐ, ๊ธฐ์–ต์— ๋‚จ๋Š” ๋‚ด์šฉ์€ EKS Scaling์— ๋Œ€ํ•œ ๋‚ด์šฉ์ด์—ˆ๋‹ค.

์ˆ˜ํ‰์  ํŒŒ๋“œ ํ™•์žฅ(HPA)

ํšŒ์‚ฌ์—์„œ ์—ฌ๋Ÿฌ K8s Cluster์™€ Object๋ฅผ ์šด์˜ํ•˜๋ฉด์„œ, ์š”์ฆ˜ Autoscaling์— ๋Œ€ํ•œ ๊ด€์‹ฌ์ด ์ƒ๊ฒผ๋‹ค. K8s์—์„œ AutoScaling์„ ์ง€์›ํ•˜๋Š”๊ฒŒ ์—ฌ๋Ÿฌ ๋ฐฉ์‹์ด ์žˆ์ง€๋งŒ ๊ฐ€์žฅ ๋จผ์ € ๋– ์˜ค๋ฅธ๊ฑด ์š” โ€œHPAโ€๋‹ค.

๊ฐ€์žฅ K8s-nativeํ•œ AutoScaling ๋ฐฉ์‹์ธ HPA๋Š” HorizontlPodAutoscaler ์ปจํŠธ๋กค๋Ÿฌ๋ฅผ ํ†ตํ•ด K8s Deployment๋ฅผ ์ž๋™์œผ๋กœ ์—…๋ฐ์ดํŠธ ํ•˜๋Š” ๋ฐฉ์‹์ด๋‹ค.

K8s 1.23๋ถ€ํ„ฐ ๋„์ž…๋œ ๊ธฐ๋Šฅ์œผ๋กœ CPU, Memory ์ง€ํ‘œ์— ๋”ฐ๋ผ ๋ ˆํ”Œ๋ฆฌ์นด ๊ฐฏ์ˆ˜๋ฅผ ์กฐ์ •ํ•  ์ˆ˜ ์žˆ๋‹ค.

K8s Event-driven Autoscaling(KEDA)

K8s HPA์™€ ๋น„์Šทํ•œ ๊ฐœ๋…์œผ๋กœ KEDA(K8s Event-driven Autoscaling)๋ž€ ๊ฒƒ๋„ ์žˆ๋‹ค. ์ž๋ฃŒ๋ฅผ ์ฐพ์•„๋ณด๋‹ˆ K8s HPA ๋ณด๋‹ค ๋” ๋‹ค์–‘ํ•œ ๋ฐฉ์‹์œผ๋กœ ์ˆ˜ํ‰ ํ™•์žฅ์ด ํ•„์š”ํ•  ๋•Œ ์ฑ„ํƒํ•˜๋Š” ๊ฒƒ ๊ฐ™๋‹ค.

K8s HPA์—์„  CPU, Mem ์ˆ˜์น˜๋งŒ์œผ๋กœ AutoScaling์„ ์šด์˜ํ•˜๋Š”๋ฐ, KEDA๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด, AWS SQS ์ง€ํ‘œ, ์›น ์ ‘์† ์ด๋ฒคํŠธ, Cron, ์‹ฌ์ง€์–ด๋Š” PromQL์„ ์‚ฌ์šฉํ•œ AutoScaling์ด ๊ฐ€๋Šฅํ•˜๋‹ค! ์‚ฌ์‹ค์ƒ ์ •๊ตํ•œ ์ˆ˜ํ‰ ํ™•์žฅ์„ ๋‹ฌ์„ฑ ํ•˜๊ธฐ ์œ„ํ•ด์„  KEDA ๋„์ž…์€ ํ•„์ˆ˜์ ์ธ ๊ฒƒ์œผ๋กœ ๋ณด์ธ๋‹ค.

ํšŒ์‚ฌ์—์„œ ์˜ˆ์ „์— ํ•œ๋ฒˆ POC ํ•ด๋ณธ ๊ฒƒ ๊ฐ™์€๋ฐ, ์š”๊ฑธ ์‚ฌ์šฉํ•˜๋Š”๊ฒŒ ๊ทธ๋ ‡๊ฒŒ ํ™œ์„ฑํ™” ๋˜์–ด ์žˆ์ง„ ์•Š๋‹ค ใ… ใ… 

Node Scaling: K8s CAS vs. Karpenter

์œ„์˜ HPA์™€ KEDA๋Š” K8s Deployment์—์„œ Pod ์ˆ˜๋ฅผ ๋™์ ์œผ๋กœ ์กฐ์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค. ๊ทธ๋Ÿฐ๋ฐ, K8s Cluster์˜ ๋…ธ๋“œ ์ˆ˜๋ฅผ ๋™์ ์œผ๋กœ ์กฐ์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•๋„ ์žˆ์„๊ฐ€? ์žˆ๋‹ค!

K8s์—์„œ๋Š” ์ด๋ฏธ Node Auto Scaling ๊ธฐ๋Šฅ โ€œCluster AutoScaler(CAS)โ€๋ฅผ ์ œ๊ณตํ•˜๊ณ  ์žˆ๋‹ค.

K8s ํด๋Ÿฌ์Šคํ„ฐ์˜ Cloud Provider์— ๋”ฐ๋ผ CAS์˜ ๋™์ž‘ ๋ฐฉ์‹์ด ๋‹ค๋ฅด์ง€๋งŒ, AWS์—์„  EC2 Auto Scaling Group์„ ์‚ฌ์šฉํ•ด Node Group ๋‹จ์—์„œ Auto Scaling์ด ์ด๋ค„์ง„๋‹ค. ํด๋Ÿฌ์Šคํ„ฐ์˜ cluster-autoscaler๊ฐ€ Pod์˜ ์ƒํƒœ๋ฅผ ๋ชจ๋‹ˆํ„ฐ๋ง ํ•˜๋ฉด์„œ, node ์„ ์ •์ด ์ง€์†์ ์œผ๋กœ ์‹คํŒจํ•œ๋‹ค๋ฉด, Node Group์„ ํ†ตํ•ด ์‹ ๊ทœ Worker ๋…ธ๋“œ๊ฐ€ ์ถ”๊ฐ€๋œ๋‹ค. ์ด๋•Œ, ์ถ”๊ฐ€๋˜๋Š” Worker ๋…ธ๋“œ๋Š” Node Grouop์— ์ •์˜๋œ ํ…œํ”Œ๋ฆฟ์˜ instance ์ŠคํŽ™์„ ๋”ฐ๋ฅธ๋‹ค!

๊ทธ๋Ÿผ Karpenter๋ผ๋Š” Node Scaler๋Š” ์™œ ๋“ฑ์žฅํ–ˆ๋Š”๊ฐ€?

Karpenter๋„ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ Pod์ด ๋“ค์–ด๊ฐˆ ์ ์ ˆํ•œ ๋…ธ๋“œ๊ฐ€ ์—†๋Š” ๊ฒฝ์šฐ์— ์‹ ๊ทœ ๋…ธ๋“œ๋ฅผ ์ถ”๊ฐ€ํ•œ๋‹ค. ๋‹จ, Karpenter์˜ ๊ฒฝ์šฐ, NodeGroup์„ ๊ธฐ์ค€์œผ๋กœ ์‹ ๊ทœ ๋…ธ๋“œ๊ฐ€ ์ถ”๊ฐ€๋˜๋Š” ๊ฒƒ์ด๋ผ ์•„๋‹ˆ๋ผ โ€œPod์˜ ์šฉ๋Ÿ‰์— ๋งž๋Š” ๊ฐ€์žฅ ์ €๋ ดํ•œ ๋…ธ๋“œโ€œ๋ฅผ ์ƒ์„ฑํ•ด ์‚ฌ์šฉํ•˜๊ฒŒ ํ•  ์ˆ˜ ์žˆ๋‹ค!! ๊ฒŒ๋‹ค๊ฐ€ ์‹ ๊ทœ ๋…ธ๋“œ๋ฅผ ๋„์šธ ๋•Œ, Spot Instance๋กœ ๋„์› ๋‹ค๊ฐ€ fail์‹œ On-Demand๋กœ ๋ฐ”๋กœ ์ „ํ™”ํ•˜๋Š” ๊ธฐ๋Šฅ๋„ ์žˆ๋‹ค๊ณ  ํ•œ๋‹ค!

๊ฐ€๋‹ค๊ฐ€ ๊ทธ์™ธ์—๋„ K8s CAS ๋ณด๋‹ค ๋…ธ๋“œ ์ถ”๊ฐ€์™€ ์ œ๊ฑฐ๊ฐ€ ๋น ๋ฅด๋‹ค(1๋ถ„ ์ด๋‚ด) ํ‰๊ฐ€๊ฐ€ ๋งŽ์•„ JIT(Just-in-Time) ์Šค์ผ€์ผ๋Ÿฌ๋กœ ์ธ์‹๋œ๋‹ค.

๋‹จ, K8s CAS์™€ Karpenter ๋‘˜๋‹ค Node Scaler์˜ ๊ธฐ๋Šฅ์„ ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋‘˜์„ ํ•จ๊ป˜ ์“ฐ๋Š” ๊ฑด ๋น„์ถ”๋ผ๊ณ  ํ•œ๋‹ค.

์•„๋งˆ ํŒ€์—์„œ๋Š” K8s CAS๋กœ NodeGroup ์‚ฌ์šฉํ•ด Node Scaling์„ ํ•˜๋Š” ๊ฒƒ ๊ฐ™์€๋ฐ, ์–ธ์  ๊ฐ€ Karpenter๋„ ์‹œํ—˜ํ•ด๋ณด๋ฉด ์ข‹์„ ๊ฒƒ ๊ฐ™๋‹ค ใ…Žใ…Ž


๋งˆ๋ฌด๋ฆฌ: ์ปจํผ๋Ÿฐ์Šค๋Š” ์ฆ๊ฑฐ์›Œ

AWS ์ปจํผ๋Ÿฐ์Šค๋Š” ๋Š˜ ๊ทธ๋ ‡๋“ฏ ๋งŽ์€ ์˜๊ฐ์„ ๋ฐ›๊ณ  ๊ฐ€๋Š” ๊ณณ์ธ ๊ฒƒ ๊ฐ™๋‹ค. ์ง€๊ธˆ ์šฐ๋ฆฌ ํŒ€์ด ์ž˜ ํ•˜๊ณ  ์žˆ๋Š”์ง€ ์ ๊ฒ€ํ•˜๊ณ , ํŒ€์— ๋„์ž…ํ•  ์‹ ๊ทœ ๊ธฐ๋Šฅ์€ ๋ญ๊ฐ€ ์žˆ์„์ง€, ๋˜ ์–ด๋–ค ๋ฐฉํ–ฅ์œผ๋กœ ๊ฐ€์•ผํ• ์ง€ ์ƒ๊ฐํ•˜๊ฒŒ ๋˜๋Š” ๊ฒƒ ๊ฐ™๋‹ค.

์ปจํผ๋Ÿฐ์Šค ํ›„์—๋Š” Databricks ์ธก์—์„œ After Party๋ฅผ ์—ด์–ด์ฃผ์…”์„œ ๋‹ค๋…€์™”๋Š”๋ฐ, ํšŒ์‚ฌ ์‚ฌ๋žŒ๋“ค์ด๋ž‘ ๊ฐ™์ด ์•‰์•˜์ง€๋งŒ, ๋‹ค๋ฅธ ๊ฐœ๋ฐœํŒ€์˜ ์ด์•ผ๊ธฐ๋ฅผ ๋งŽ์ด ๋“ฃ๊ณ , ๋˜ ๋‚ด๊ฐ€ ๊ถ๊ธˆํ–ˆ๋˜ ๊ฒƒ๋„ ๋งŽ์ด ์งˆ๋ฌธํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค ใ…Žใ…Ž ์—ฌ๊ธฐ์„œ๋„ ์ง„์งœ์ง„์งœ ๋งŽ์€ ์˜๊ฐ์„ ๋ฐ›์•˜๋‹ค ใ…Žใ…Ž ์•„ ๊ทธ๋ฆฌ๊ณ  ๊ณต์งœ ๋งฅ์ฃผ์™€ ๋ฌด์ œํ•œ ๋ฉ”๋‰ด๋„ ๋„ˆ๋ฌด ์ข‹์•˜๋‹ค ๐Ÿท

ํŒ๊ต ๊นŒ์ง€ ๊ฐ€๋Š”๊ฒŒ ์‰ฝ์ง„ ์•Š์•˜์ง€๋งŒ ํŒ๊ต์˜ ํšŒ์‚ฌ ๋ถ„์œ„๊ธฐ๋„ ๋Š๋‚„ ์ˆ˜ ์žˆ์—ˆ๋‹ค ใ…Žใ…Ž ํŠนํžˆ ๊ฐœ๋ฐœ ํšŒ์‚ฌ๋“ค ์‚ฌ์˜ฅ ๊ตฌ๊ฒฝํ•˜๋Š” ์žฌ๋ฏธ๊ฐ€ ์ ์  ํ–ˆ๋‹ค ใ…‹ใ…‹ ๊ทธ๋ž˜์„œ ์ €๋ฒˆ ์ฃผ๋ง์— ํ•œ๋ฒˆ๋” ํŒ๊ต ๊ฐ€์„œ ์ฃผ๋ณ€ ๊ตฌ๊ฒฝํ•˜๊ณ  ํ™”๋ž‘ ๊ณต์›์—์„œ ํ”ผํฌ๋‹‰๋„ ํ•˜๊ณ  ์™”๋‹ค ๐Ÿงบ

Categories:

Updated: