2021-1ํ•™๊ธฐ, ๋Œ€ํ•™์—์„œ โ€˜๊ณผ์ œ์—ฐ๊ตฌโ€™ ์ˆ˜์—…์—์„œ ์ง„ํ–‰ํ•˜๋Š” ๊ฐœ์ธ ํ”„๋กœ์ ํŠธ๋ฅผ ์œ„ํ•ด ๊ฐœ์ธ์ ์œผ๋กœ ์ •๋ฆฌํ•œ ํฌ์ŠคํŠธ์ž…๋‹ˆ๋‹ค. ์ง€์ ๊ณผ ๊ต๋ฅ˜๋Š” ์–ธ์ œ๋‚˜ ํ™˜์˜์ž…๋‹ˆ๋‹ค :)

13 minute read

2021-1ํ•™๊ธฐ, ๋Œ€ํ•™์—์„œ โ€˜๊ณผ์ œ์—ฐ๊ตฌโ€™ ์ˆ˜์—…์—์„œ ์ง„ํ–‰ํ•˜๋Š” ๊ฐœ์ธ ํ”„๋กœ์ ํŠธ๋ฅผ ์œ„ํ•ด ๊ฐœ์ธ์ ์œผ๋กœ ์ •๋ฆฌํ•œ ํฌ์ŠคํŠธ์ž…๋‹ˆ๋‹ค. ์ง€์ ๊ณผ ๊ต๋ฅ˜๋Š” ์–ธ์ œ๋‚˜ ํ™˜์˜์ž…๋‹ˆ๋‹ค :)

Model Design

๋จผ์ € ์…‹์—…์„ ์ข€ ์‚ดํŽด๋ณด์ž.

ํ–‰๋ ฌ $R \in \mathbb{R}^{N_u \times N_i}$์€ ํ‰์  ํ–‰๋ ฌ๋กœ, ๊ฐ ์œ ์ €๊ฐ€ ์•„์ดํ…œ์— ๋Œ€ํ•ด ๋งค๊ธด ํ‰์  ์ •๋ณด๊ฐ€ ๋“ค์–ด์žˆ๋‹ค. ์ด๋•Œ, $N_u$๋Š” ์ด ์œ ์ €์˜ ์ˆ˜, $N_i$๋Š” ์ด ์•„์ดํ…œ์˜ ์ˆ˜๋ฅผ ์˜๋ฏธํ•œ๋‹ค.

์ด์ œ ์ด ํ–‰๋ ฌ $R$๋ฅผ <latent factor matrix> $X$, $Y$๋กœ ๋ถ„ํ•ดํ•ด๋ณด์ž! ์ด๋•Œ โ€œlatent factor์˜ ์ฐจ์›โ€œ์„ ์ •ํ•ด์•ผ ํ•˜๋Š”๋ฐ, $N_f$๋ผ๊ณ  ์„ค์ •ํ•ด๋‘์ž! ๋ณดํ†ต 50์—์„œ 200 ์‚ฌ์ด๋กœ ์„ค์ •ํ•œ๋‹ค๊ณ  ํ•œ๋‹ค. ๊ทธ๋ž˜์„œ MF๋ฅผ ์ง„ํ–‰ํ•˜๋ฉด, ํ–‰๋ ฌ $X$, $Y$๋Š” ๊ฐ๊ฐ $X \in \mathbb{R}^{N_f \times N_u}$, $Y \in \mathbb{R}^{N_f \times N_i}$๊ฐ€ ๋œ๋‹ค.

<Latent factor matrix> $X$, $Y$๋Š” ๊ฐ๊ฐ ์šฐ๋ฆฌ๊ฐ€ ํ•™์Šต์‹œํ‚ค๊ณ ์ž ํ•˜๋Š” ๋Œ€์ƒ์ด๋‹ค. ์ด ํ–‰๋ ฌ๋“ค์€ ์ฒ˜์Œ์— ์•„์ฃผ ์ž‘์€ ๋žœ๋ค๊ฐ’์œผ๋กœ ์ดˆ๊ธฐํ™”๋œ๋‹ค. (๐Ÿ’ฅ ํ–‰๋ ฌ $R$์˜ ๊ฐ’์„ ์ชผ๊ฐœ์–ด ์ƒ์„ฑํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋‹ค!)

์ด์ œ ์šฐ๋ฆฌ๋Š” factor matrix $X$, $Y$๋ฅผ ํ†ตํ•ด ํ‰์  ํ–‰๋ ฌ์˜ prediction์ธ $\hat{R}$์„ ์œ ๋„ํ•  ๊ฒƒ์ด๋‹ค. ๋ฐฉ๋ฒ•์€ ๊ฐ„๋‹จํ•œ๋ฐ, ๊ทธ๋ƒฅ $X$์™€ $Y$๋ฅผ ๊ณฑํ•ด์ฃผ๋ฉด ๋œ๋‹ค.

\[\hat{R} = X^T \times Y\]

์ด๋•Œ, ํ–‰๋ ฌ $\hat{R}$์˜ ์›์†Œ์ธ $\hat{r}_{ui}$๋Š” ์œ ์ € $u$๊ฐ€ ์•„์ดํ…œ $i$์— ๋Œ€ํ•ด ๋‚ด๋ฆฌ๋Š” ํ‰์ ์„ predictionํ•œ ๊ฒƒ์ด๋‹ค.

\[\hat{r}_{ui} = x_u^T \times y_i\]

์ฆ‰, ์‚ฌ์šฉ์ž์˜ latent vector $x_u$์™€ ์•„์ดํ…œ์˜ latent vector $y_i$๋ฅผ ๊ณฑํ•ด ํ‰์ ์„ ์ถ”๋ก ํ•˜๋Š” ๊ฒƒ์ด๋‹ค. ๊ทธ๋ž˜์„œ LMF ๋ชจ๋ธ์˜ ๋ชฉํ‘œ๋Š” \(\hat{r}_{ui}\)๊ฐ€ ์ตœ๋Œ€ํ•œ \(r_{ui}\)์™€ ๊ฐ€๊นŒ์›Œ์ง€๋„๋ก Latent Factor Matrix \(X\), \(Y\)์˜ ๊ฐ’์„ ์กฐ์ •ํ•˜๋Š” ๊ฒƒ์ด๋ผ๊ณ  ๋ณด๋ฉด ๋œ๋‹ค!


How to Train?: Loss

์ด์ œ ๋ชจ๋ธ์„ ํ•™์Šต์‹œํ‚ค๊ธฐ ์œ„ํ•œ ํŒŒํŠธ๋‹ค. ๋ฐฉ๋ฒ•์€ strightforward ํ•œ๋ฐ, ๊ทธ๋ƒฅ \(\hat{r}_ui\)์™€ $r_{ui}$ ์‚ฌ์ด์˜ ๊ฐ’์ด ๊ฐ€๊นŒ์›Œ์ง€๋„๋ก ๋‘ ๊ฐ’์˜ ์ฐจ์ด๊ฐ’์„ minimize ํ•˜๋ฉด ๋œ๋‹ค!

\[L(X, Y) = \sum_{u, i} \left( r_{ui} - x_u^T y_i \right)^2\]

Regularization ํ…€์„ ์ถ”๊ฐ€ํ•ด์ฃผ๋ฉด ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

\[L(X, Y) = \sum_{u, i} \left( r_{ui} - x_u^T y_i \right)^2 + \lambda \left( \sum_u \left| x_u \right|^2 + \sum_i \left| y_i \right|^2 \right)\]

Optimization

Loss Function์„ ๋””์ž์ธ ํ–ˆ์œผ๋‹ˆ ์ด์ œ Optimization๋งŒ ๋‹ฌ์„ฑํ•˜๋ฉด ๋œ๋‹ค. ๋‘ ๊ฐ€์ง€ ๋ฐฉ๋ฒ•์ด ์žˆ๋Š”๋ฐ, <Gradient Descent>์™€ <Alternating Least Squares>, ๋‘ ๊ฐ€์ง€ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ์žˆ๋‹ค. ๊ฐ๊ฐ ์–ด๋–ป๊ฒŒ ๋™์ž‘ํ•˜๋Š”์ง€ ์‚ดํŽด๋ณด์ž!

Gradient Descent

GD๋Š” ๊ทธ๋ƒฅ Loss ํ•จ์ˆ˜ $L(X, Y)$๋ฅผ ๋ฏธ๋ถ„ํ•˜๊ณ  ์ด์— ๋Œ€ํ•œ Gradient ๊ฐ’์„ back-propagation ํ•ด์ฃผ๋ฉด ๋œ๋‹ค. ์‹ค์ œ ๋™์ž‘์€ ๋”ฅ๋Ÿฌ๋‹ ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ์‰ฝ๊ฒŒ ๋‹ฌ์„ฑํ•  ์ˆ˜ ์žˆ๋‹ค.

๊ฐ„๋‹จํ•˜๊ฒŒ $x_u$์— ๋Œ€ํ•ด์„œ๋งŒ GD๋ฅผ ์ ์šฉํ•ด๋ณด์ž.

\[\begin{aligned} \frac{\partial}{\partial x_u} L(x_u, y_i) &= \frac{\partial}{\partial x_u} \left( \sum_i \left( r_{ui} - x_u^T y_i \right)^2 + \lambda \left( \left| x_u \right|^2 + \sum_i \left| y_i \right|^2 \right)\right) \\ &= \sum_i 2 \left( r_{ui} - x_u^T y_i \right) (-y_i) + \lambda (2 x_u) \end{aligned}\]

์ด์ œ ์ด Gradient ๊ฐ’์œผ๋กœ weight update๋ฅผ ์ง„ํ–‰ํ•˜๋ฉด ๋œ๋‹ค.

\[x_u \leftarrow x_u + \alpha \cdot \left( \sum_i \left( r_{ui} - x_u^T y_i \right) (y_i) - \lambda x_u \right)\]

๐Ÿ’ฅ ๋ณธ์ธ์€ $y_i$์˜ ์ฐจ์› ๋•Œ๋ฌธ์— ์œ ๋„๋ฅผ ํ•˜๊ณ ๋„ ์œ„์˜ ์‹์ด ์กฐ๊ธˆ ํ—ท๊ฐˆ๋ ธ๋Š”๋ฐ, ์…‹์—…์„ ๋‹ค์‹œ ๋ณด๋‹ˆ๊นŒ, $y_i$๊ฐ€ $N_f$ ์ฐจ์›์˜ ์—ด๋ฒกํ„ฐ์˜€๋‹ค ใ…‹ใ…‹ใ…‹

Gradient Descent ๋ฐฉ์‹์˜ ๋‹จ์ ์€ ์ตœ์ ํ™”๋ฅผ ์‹œํ‚ค๋Š” ๊ณผ์ •์ด ๋„ˆ๋ฌด ๋Š๋ฆฌ๊ณ , ๋งŽ์€ ๋ฐ˜๋ณต์ด ํ•„์š”ํ•˜๋‹ค. ๋˜, Global minimum์ด ์•„๋‹Œ local minimum์— stuckํ•  ๊ฐ€๋Šฅ์„ฑ์ด ์žˆ๋‹ค ๋“ฑ๋“ฑ์˜ ๋‹จ์ ์ด ์žˆ๋‹ค. ๋‹จ์ ์ด ์žˆ๊ธด ์žˆ๋‹ค ๋‘๋ฒˆ์งธ ๋ฐฉ๋ฒ•์ธ <Alternating Least Squares>๋Š” ์ด๋Ÿฐ ๋ฌธ์ œ๋ฅผ ์Šค๋งˆํŠธํ•˜๊ฒŒ ํ•ด๊ฒฐํ•œ๋‹ค! ๐Ÿ˜Ž


Alternating Least Squares

<Alternating Least Squares>์˜ ์ปจ์…‰์€ $X$, $Y$ ๋‘˜ ์ค‘ ํ•˜๋‚˜๋ฅผ ๊ณ ์ •์‹œํ‚ค๊ณ , ๋‹ค๋ฅธ ํ•˜๋‚˜๋ฅผ ์ตœ์ ํ™” ์‹œํ‚จ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค. ์ด ๊ณผ์ •์„ ๋ฒˆ๊ฐˆ์•„๊ฐ€๋ฉด ๋ฐ˜๋ณต, ์ฆ‰ alternating ํ•˜๋ฉด์„œ ์งง์€ ์‹œ๊ฐ„ ๋‚ด์— ์ตœ์ ์˜ $X$, $Y$๋ฅผ ์ฐพ์•„๋‚ธ๋‹ค! (๋‘ ํ–‰๋ ฌ์„ ํ•œ๊บผ๋ฒˆ์— ์ตœ์ ํ™”์‹œํ‚ค๋Š” ๊ฒƒ์€ ์–ด๋ ต๋‹ค ๐Ÿ’ซ)

๋จผ์ € <ALS>์˜ loss๋Š” ์•„๋ž˜์™€ ๊ฐ™์ด ์ •์˜๋œ๋‹ค.

\[L(X, Y) = \sum_{u, i} c_{ui} \cdot \left(p_{ui} - x_u^T y_i \right)^2 + \lambda \left( \sum_u \left| x_u \right|^2 + \sum_i \left| y_i \right|^2 \right)\]

<GD>์—์„œ์˜ Loss์™€ ์กฐ๊ธˆ ๋‹ฌ๋ผ์กŒ๋Š”๋ฐ, $c_{ui}$์™€ $p_{ui}$๊ฐ€ ์ถ”๊ฐ€๋˜์—ˆ๋‹ค. ์ด ๋‘ ๊ฐ’์€ ํ‰์ ์„ ์„ ํ˜ธ๋„(preference) $p$์™€ ์‹ ๋ขฐ๋„(confidence) $c$๋กœ ๋‚˜๋ˆ„์–ด ์ ‘๊ทผํ•œ ๊ฒƒ์ด๋ผ๊ณ  ์„ค๋ช…ํ•œ๋‹ค.


๋จผ์ €, ์„ ํ˜ธ๋„ $p_{ui}$๋Š” ์•„๋ž˜์™€ ๊ฐ™์ด ์ •์˜๋œ๋‹ค. ํ‰์  $r_{ui}$์˜ ๊ฐ’์— ์˜ํ•ด ์ •์˜๋œ๋‹ค.

\[p_{ui} = \begin{cases} 1 & \text{if} \; r_{ui} > 0 \\ 0 & \text{if} \; r_{ui} = 0 \end{cases}\]

์ด๊ฒƒ์€ ์œ ์ €๊ฐ€ ํ‰์ ์„ ๋‚จ๊ฒผ๋‹ค๋ฉด($r_{ui} > 0$), ์œ ์ €๊ฐ€ ์„ ํ˜ธ๋„๋ฅผ ๊ฐ€์ง„๋‹ค๋Š” ๊ฒƒ์„ ํ‘œํ˜„ํ•œ ์‹์ด๋‹ค. (<ALS>๋Š” Implicit Dataset์—์„œ ์ด๊ธฐ ๋•Œ๋ฌธ์— ์‚ฌ์šฉ์ž๊ฐ€ ์„ ํ˜ธ์™€ ๋น„์„ ํ˜ธ๋ฅผ ๊ตฌ๋ถ„ํ•˜์ง€ ์•Š๋Š”๋‹ค!)


๋‹ค์Œ์œผ๋กœ ์‹ ๋ขฐ๋„ $c_{ui}$๋Š” ์•„๋ž˜์™€ ๊ฐ™์ด ์ •์˜๋œ๋‹ค. ์‹ค์ œ๋ก  ์„ ํ˜ธํ•˜์ง€๋งŒ, ํ‰์ ์ด ์—†๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ์œ„ํ•ด ๋„์ž…ํ•œ ๊ฐ’์ด๋‹ค.

\[c_{ui} = 1 + \alpha r_{ui}\]

์šฐ๋ฆฌ๋Š” ์„ ํ˜ธ๋„ $p_{ui}$๋ฅผ ํ†ตํ•ด ํ‰์ ์€ ์—†๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ๋ชจ๋‘ 0์œผ๋กœ ๋ฐ”๊พธ์—ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์—ฌ๊ธฐ์—๋Š” ์‹ค์ œ๋ก  ์„ ํ˜ธํ•˜์ง€๋งŒ, ํ‰์ ์ด ์—†๋Š” ๊ฒฝ์šฐ๋„ ์žˆ์„ ์ˆ˜ ์žˆ๋‹ค. <ALS>์—์„œ๋Š” ์ด ๊ฒฝ์šฐ๋ฅผ ๋ฐ์ดํ„ฐ์˜ ์‹ ๋ขฐ๋„๊ฐ€ ๋‚ฎ์€ ๊ฒƒ์œผ๋กœ ํ•ด์„ํ•œ๋‹ค!

<ALS>๋Š” ์‹ ๋ขฐ๋„ ๋ณ€์ˆ˜๋ฅผ ๋„์ž…ํ•ด ํ‰์ ์ด ์—†๋Š” ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์˜ˆ์ธก๋„ ์ „์ฒด Loss Function์— ์˜ํ–ฅ์„ ์ฃผ๋„๋ก ๋งŒ๋“ค์—ˆ๋‹ค. ์ด๊ฒƒ์€ ํ‰์ ์ด ์—†๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ๋ชจ๋‘ ํ•™์Šต์—์„œ ์ œ์™ธํ•œ <Explicit Dataset>๊ณผ๋Š” ๋Œ€์กฐ๋˜๋Š” ์ ์ด๋‹ค! ๐Ÿ˜ฒ

์‹ ๋ขฐ๋„ ๋ณ€์ˆ˜ $c_{ui}$์—๋Š” ํ‰์  $r_{ui}$๊ฐ€ ์žˆ๋Š”๋ฐ, ์ด๊ฒƒ์„ ํ†ตํ•ด ํ‰์ ์ด ์—†๋Š” ๋ฐ์ดํ„ฐ์—๋Š” ๋‚ฎ์€ $c$ ๊ฐ’์„ ๋ถ€์—ฌํ•ด loss์— ํฌํ•จํ•˜๋˜ ํ•™์Šต์— ๋ฏธ์น˜๋Š” ์˜ํ–ฅ์ด ์ž‘๋„๋ก ๋งŒ๋“ค์—ˆ๋‹ค.


โ€˜๊ฐˆ์•„๋จน๋Š” ์ถ”์ฒœ ์•Œ๊ณ ๋ฆฌ์ฆ˜โ€™์˜ ์ €์ž๋ถ„๊ป˜์„œ๋Š” ์‹ ๋ขฐ๋„ ๋ณ€์ˆ˜ $c_{ui}$๋ฅผ ๋„์ž…ํ•˜๋Š” ์ด์œ ๋ฅผ โ€œImplicit Dataset์— ํ‰์ ์ด ์—†๋Š” ๋ฐ์ดํ„ฐ๊ฐ€ ํ›จ์”ฌ ๋งŽ์•„, ์‹ค์ œ ๋ฐ์ดํ„ฐ์…‹์€ ํ›จ์”ฌ sparseํ•œ matrixโ€๋ผ๊ณ  ์„ค๋ช…ํ•ด์ฃผ์…จ๋‹ค. ๋งŒ์•ฝ, ํ‰์ ์ด ์—†๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ๋ฐฐ์ œํ•˜๊ณ  ํ•™์Šต์„ ์ง„ํ–‰ํ•œ๋‹ค๋ฉด, ์ด๊ฒƒ์€ ํ•™์Šต์— ๋Œ€ํ•œ ์˜ฌ๋ฐ”๋ฅธ ์ ‘๊ทผ์ด ์•„๋‹ˆ๋ฉฐ, ์„ค๋ช… ๊ฐ€์ค‘์น˜๊ฐ€ ๋‚ฎ๋”๋ผ๋„ ๊ทธ ์ˆ˜๊ฐ€ ์›”๋“ฑํžˆ ๋งŽ๊ธฐ ๋•Œ๋ฌธ์— ํ•™์Šต์— ์œ ์˜๋ฏธํ•œ ์˜ํ–ฅ์„ ๋ฏธ์น˜๊ฒŒ ๋œ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

์ด๋ฒˆ์—๋Š” ์‹ค์ œ๋กœ <ALS>์˜ ๋™์ž‘์„ ์‚ดํŽด๋ณด์ž!

1. ๋จผ์ € ์‚ฌ์šฉ์ž์™€ ์•„์ดํ…œ์˜ Latent Factor ํ–‰๋ ฌ์„ ์•„์ฃผ ์ž‘์€ ๋žœ๋ค๊ฐ’์œผ๋กœ ์ดˆ๊ธฐํ™” ํ•œ๋‹ค.

2. ๋‘˜ ์ค‘ ํ•˜๋‚˜๋ฅผ ์ƒ์ˆ˜๋กœ ๊ณ ์ •์‹œ์ผœ, Loss Function์„ Convex Function์œผ๋กœ ๋งŒ๋“ ๋‹ค.

3. Loss๋ฅผ ํŽธ๋ฏธ๋ถ„ ํ•œ๋‹ค. ๋ฏธ๋ถ„ ๊ฐ’์„ 0์œผ๋กœ ๋งŒ๋“œ๋Š” ํ–‰๋ ฌ์„ ๊ณ„์‚ฐํ•œ๋‹ค.

4. [2-3] ๋ฐ˜๋ณต

์•„์ดํ…œ์˜ Latent Factor๋ฅผ ๊ณ ์ •ํ•˜๊ณ , ์‚ฌ์šฉ์ž์˜ LF๋ฅผ ์ตœ์ ํ™” ์‹œ์ผœ๋ณด์ž.

์•„์ดํ…œ ํ–‰๋ ฌ์„ ๊ณ ์ •ํ•˜๊ณ , <ALS>์˜ Loss๋ฅผ ์‚ฌ์šฉ์ž $x_u$์— ๋Œ€ํ•ด ๋ฏธ๋ถ„ํ•˜๋ฉด ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

\[\begin{aligned} \frac{\partial L(x_u)}{\partial x_u} &= \frac{\partial }{\partial x_u} \left[ \sum_{i} c_{ui} \cdot \left(p_{ui} - x_u^T y_i \right)^2 + \lambda \left( \left| x_u \right|^2 + \sum_i \left| y_i \right|^2 \right) \right] \\ &= \left[\sum_{i} c_{ui} \cdot 2 \left( p_{ui} - x^T_u y_i \right) ( -y_i)\right] + 2 \lambda x_u = 0 \end{aligned}\]

์‹์„ ์ •๋ฆฌํ•˜๋ฉด,

\[\left(\sum_i c_{ui} \cdot x_u^T y_i \cdot y_i\right) + \lambda x_u = \sum_i c_{ui} \cdot p_{ui} \cdot y_i\]

์‹์„ $x_u$์— ๋Œ€ํ•ด ์ •๋ฆฌํ•ด์•ผ ํ•˜๊ธฐ ๋•Œ๋ฌธ์—, scalar $x_u^T y_i$๋ฅผ $y_i^T x_u$๋กœ ๋ฐ”๊ฟ”์ค€๋‹ค.

\[\left(\sum_i c_{ui} \cdot y_i^T x_u \cdot y_i\right) + 2 \lambda x_u = \sum_i c_{ui} \cdot p_{ui} \cdot y_i\]

์ด์ œ ์ขŒ๋ณ€์„ $x_u$์— ๋Œ€ํ•ด ๋ฌถ์–ด์ค€๋‹ค.

\[\begin{aligned} \left(\sum_i c_{ui} \cdot y_i^T x_u \cdot y_i\right) + 2 \lambda x_u &= \left(\sum_i c_{ui} \cdot y_i \cdot y_i^T x_u \right) + 2 \lambda x_u \\ &= \left(\sum_i c_{ui} \cdot y_i y_i^T \right) x_u + 2 \lambda x_u \\ &= \left[ \left( \sum_i c_{ui} \cdot y_i y_i^T \right) + 2 \lambda I \right] x_u \end{aligned}\]

์œ„์˜ ์‹์—๋Š” ํ•ฉ(ๅˆ)์œผ๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ์–ด ์‹์„ ๊ฒฐ๊ณผ๋ฅผ ์–ป๋Š”๋ฐ ์กฐ๊ธˆ ๋ถˆํŽธํ•˜๋‹ค. ๊ทธ๋ž˜์„œ ์•„๋ž˜์˜ ๊ณผ์ •์„ ํ†ตํ•ด ์‹์„ ์ข€๋” ๋‹จ์ˆœํ™”ํ•ด๋ณด์ž!

์•„์ดํ…œ ํ–‰๋ ฌ $Y = [y_1, y_2, \cdots, y_i]$์— ๋Œ€ํ•ด $Y \times Y^T$๋Š” ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

\[YY^T = \left[ y_1 y_1^T + y_2 y_2^T + \cdots + y_i y_i^T \right] = \sum_i y_i y_i^T\]

ํ•˜์ง€๋งŒ, ์œ„์˜ ์‹์—๋Š” ์‹ ๋ขฐ๋„ $c_{ui}$๊ฐ€ ๋น ์ ธ์žˆ๋‹ค. ์ด๊ฒƒ์€ diag matrix๋ฅผ ํ†ตํ•ด ์‰ฝ๊ฒŒ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ๋‹ค!

Let $C_u$ be

\[C_u = \begin{pmatrix} c_{u1} & 0 & 0 & 0 \\ 0 & c_{u2} & 0 & 0 \\ 0 & 0 & \ddots & 0 \\ 0 & 0 & 0 & c_{ui} \\ \end{pmatrix}\]

then, $YC_uY^T$ is

\[YC_uY^T = \left[ c_{u1} y_1 y_1^T + c_{u2} y_2 y_2^T + \cdots + c_{ui} y_i y_i^T \right] = \sum_i c_{ui} y_i y_i^T\]

์šฐ๋ณ€๋„ ์‹์„ ์ •๋ฆฌํ•ด๋ณด์ž.

\[\begin{aligned} Y^T C_u p_u &= [y_1, \dots, y_i] \begin{pmatrix} c_{u1} & 0 & 0 & 0 \\ 0 & c_{u2} & 0 & 0 \\ 0 & 0 & \ddots & 0 \\ 0 & 0 & 0 & c_{ui} \\ \end{pmatrix} \begin{pmatrix} p_{u1} \\ \vdots \\ p_{ui} \end{pmatrix} \\ &= [c_{u1}y_1, \dots, c_{ui}y_i] \begin{pmatrix} p_{u1} \\ \vdots \\ p_{ui} \end{pmatrix} \\ &= p_{u1}c_{u1}y_1 + \cdots + p_{ui}c_{ui}y_i = \sum_i c_{ui}p_{ui}y_i \end{aligned}\]

์ด์ œ ์œ„์˜ ๊ฒฐ๊ณผ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ (์ค€์‹)์„ ๋‹ค์‹œ ๊ธฐ์ˆ ํ•ด๋ณด์ž.

\[\begin{aligned} \left[ \left( \sum_i c_{ui} \cdot y_i y_i^T \right) + \lambda I \right] x_u &= \sum_i c_{ui} \cdot p_{ui} \cdot y_i \\ (YC_uY^T + \lambda I) x_u &= Y^T C_u p_u \end{aligned}\]

์ด์ œ $x_u$๋ฅผ ์–ป๊ธฐ ์œ„ํ•ด ์ขŒ๋ณ€์˜ ํ–‰๋ ฌ์„ ์šฐ๋ณ€์œผ๋กœ ์˜ฎ๊ธฐ๋ฉด,

\[x_u = (YC_uY^T + \lambda I)^{-1} \cdot Y^T C_u p_u\]

์‹์„ ์•ฝ๊ฐ„ ๋‹ค๋“ฌ์œผ๋ฉด,

\[x_u = (Y^TC_uY + \lambda I)^{-1} \cdot Y^T C_u p_u\]

๋!! ์ด๋ ‡๊ฒŒ ๊ตฌํ•œ $x_u$๋กœ ์‚ฌ์šฉ์ž ํ–‰๋ ฌ $X$๋ฅผ ์—…๋ฐ์ดํŠธ ํ•˜๋ฉด ๋œ๋‹ค!!

๋‹ค์Œ์—๋Š” ์‚ฌ์šฉ์ž ํ–‰๋ ฌ $X$๋ฅผ ๊ณ ์ •ํ•˜๊ณ , ์•„์ดํ…œ ํ–‰๋ ฌ $Y$์— ๋Œ€ํ•ด ๋™์ผํ•œ ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•ด $y_i$๋ฅผ ๊ตฌํ•˜๋ฉด,

\[y_u = (X^TC_iX + \lambda I)^{-1} \cdot X^T C_i p_i\]


๋ณดํ†ต ๋ฐ˜๋ณต ํšŸ์ˆ˜๋ฅผ 10~15ํšŒ ์ •๋„๋กœ ์ง„ํ–‰ํ•˜๋ฉด, ๋ฐ์ดํ„ฐ ํฌ๊ธฐ์™€ sparse ์ •๋„์— ๋”ฐ๋ผ ํšŸ์ˆ˜๋Š” ์กฐ์ •๋œ๋‹ค๊ณ  ํ•œ๋‹ค.

<ALS> ์•Œ๊ณ ๋ฆฌ์ฆ˜์— ๋Œ€ํ•œ ๊ตฌํ˜„์€ โ€˜๊ฐˆ์•„๋จน๋Š” ์ถ”์ฒœ ์•Œ๊ณ ๋ฆฌ์ฆ˜โ€™์˜ ์ €์ž โ€˜ํ˜•์ค€ํ‚ดโ€™๋‹˜์˜ ํฌ์ŠคํŠธ๋ฅผ ์ฐธ๊ณ ํ•˜๊ธธ ๋ฐ”๋ž€๋‹ค.

๐Ÿ‘‰ ๊ฐˆ์•„๋จน๋Š” ์ถ”์ฒœ ์•Œ๊ณ ๋ฆฌ์ฆ˜ [5] ALS ๊ตฌํ˜„ํ•˜๊ธฐ


references