๋ณธ ๊ธ€์€ 2020-2ํ•™๊ธฐ โ€œ์ปดํ“จํ„ฐ ๋น„์ „โ€ ์ˆ˜์—…์˜ ๋‚ด์šฉ์„ ๊ฐœ์ธ์ ์œผ๋กœ ์ •๋ฆฌํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ง€์ ์€ ์–ธ์ œ๋‚˜ ํ™˜์˜์ž…๋‹ˆ๋‹ค :)

10 minute read

๋ณธ ๊ธ€์€ 2020-2ํ•™๊ธฐ โ€œ์ปดํ“จํ„ฐ ๋น„์ „โ€ ์ˆ˜์—…์˜ ๋‚ด์šฉ์„ ๊ฐœ์ธ์ ์œผ๋กœ ์ •๋ฆฌํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ง€์ ์€ ์–ธ์ œ๋‚˜ ํ™˜์˜์ž…๋‹ˆ๋‹ค :)

keywords

  • Metric
    • semantic distance
    • Mahalanobis Distance
  • Metric Learinng: a fist approach
  • LMNN

What is a Metric?

Metric์€ ์ง‘ํ•ฉ ์•ˆ์— ์žˆ๋Š” ๋ชจ๋“  ์›์†Œ ์‚ฌ์ด์˜ ๊ฑฐ๋ฆฌdistance๋ฅผ ์ˆ˜์น˜ํ™”ํ•˜๋Š” ํ•จ์ˆ˜function์ด๋‹ค.

์ผ๋ฐ˜์ ์œผ๋กœ Euclid distance, Manhattan distance, cosine similarity ๋“ฑ์ด Metric function์œผ๋กœ ์“ฐ์ธ๋‹ค.

Metric ๋‘ ๋Œ€์ƒ ์‚ฌ์ด์˜ distance๋ฅผ ์ธก์ •ํ•˜๋Š” ๊ฒƒ์ด์ง€๋งŒ, ๋ฐ˜๋Œ€๋กœ ์ƒ๊ฐํ•˜๋ฉด ๋‘ ๋Œ€์ƒ ์‚ฌ์ด์˜ ์œ ์‚ฌ๋„similarity๋ฅผ ๋”ฐ์ง€๋Š” ๋ฐ์—๋„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค!


์ผ๋ฐ˜์ ์ธ Metric์ด ๋‘ ๋Œ€์ƒ ์‚ฌ์ด์˜ ๊ธฐํ•˜ํ•™์  ํŠน์ง•์„ ์ด์šฉํ•ด ๊ฑฐ๋ฆฌ๋ฅผ ๋ถ€์—ฌํ•œ๋‹ค๋ฉด, ์ปดํ“จํ„ฐ ๋น„์ „, ๊ทธ ์ค‘์—์„œ๋„ Metric Learning์€ ๋‘ ๋Œ€์ƒ ์‚ฌ์ด์˜ semantic distance์— ์ฃผ๋ชฉํ•œ๋‹ค!


Pairwise vs. Triplet

Metric์„ ํ•™์Šต ๋ฐฉ๋ฒ•์—๋Š” ํฌ๊ฒŒ ๋‘ ๊ฐ€์ง€ ์ ‘๊ทผ์ด ์žˆ๋‹ค.

Pairwise

๋…ผ์˜์˜ ํŽธ์˜๋ฅผ ์œ„ํ•ด ๋Œ€์ƒ $\{ x_1, x_2, x_3 \}$์— ๋Œ€ํ•œ Metric์„ ํ•™์Šตํ•œ๋‹ค๊ณ  ๊ฐ€์ •ํ•˜์ž.

์šฐ๋ฆฌ์˜ ๋ชฉํ‘œ๋Š” ๋‘ ํ•จ์ˆ˜ $D$์™€ $f$๋ฅผ ํ•™์Šตํ•˜๋Š” ๊ฒƒ์ด๋‹ค.

ํ•จ์ˆ˜ $D$๋Š” ๋‘ ์ž…๋ ฅ์„ ๋ฐ›์•„ ๋‘ ์ž…๋ ฅ ์‚ฌ์ด์˜ ๊ฑฐ๋ฆฌ๋ฅผ ๋ฐ˜ํ™˜ํ•œ๋‹ค.

\[D(x, y) = \textrm{distance btw two objects}\]

ํ•จ์ˆ˜ $f$๋Š” ๋Œ€์ƒ์—์„œ feature๋ฅผ ์ถ”์ถœํ•˜๋Š” feature extractor์ด๋‹ค.

\[f(x) = \textrm{feature vector}\]

์ด์ œ ๋‘ ํ•จ์ˆ˜ $D$, $f$๋ฅผ ์ž˜ ์กฐํ•ฉํ•ด ๋‹ค์Œ์˜ Pairwise relation์„ ํ•™์Šต์‹œํ‚จ๋‹ค๊ณ  ํ•˜์ž.

\[D(f(x_1), f(x_2)) \downarrow \quad D(f(x_1), f(x_3)) \uparrow\]

์ฆ‰, Metric Learning๊ฐ€ ๋‘ ๋Œ€์ƒ์˜ extracted feature์˜ ๊ฐ’์„ ์ตœ๋Œ€ํ•œ ๋‚ฎ์ถ”๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•œ๋‹ค๋Š” ๋ง์ด๋‹ค.

Loss minimization๊ณผ ๋น„์Šทํ•œ ๋งฅ๋ฝ์ด๋‹ค.

Triplet

๋ฐ˜๋ฉด์— Triplet์€ ์•„๋ž˜์™€ ๊ฐ™์€ relation์„ ํ•™์Šต์‹œํ‚จ๋‹ค.

\[D(f(x_1), f(x_2)) < D(f(x_1), f(x_3))\]

์ฆ‰, ๋Œ€์ƒ ์‚ฌ์ด์˜ ์ˆ˜์น˜ํ™”๋œ ๊ฑฐ๋ฆฌ๊ฐ’์„ ์ค„์ด๊ฑฐ๋‚˜ ๋Š˜๋ฆฌ๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ, ๊ฑฐ๋ฆฌ๊ฐ’์˜ ๋Œ€์†Œ ๊ด€๊ณ„๋ฅผ ํ•™์Šตํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•œ๋‹ค.

Triplet relation์„ ํ•™์Šต์‹œํ‚ค๋Š” ๊ฒƒ์€ ๋‹จ์ˆœํžˆ ๋‹ค์†Œ ๊ด€๊ณ„๋งŒ ๋งŒ์กฑ์‹œํ‚ค๋ฉด ๋˜๊ธฐ ๋•Œ๋ฌธ์— Pairwise relation์„ ํ•™์Šต์‹œํ‚ค๋Š” ๊ฒƒ๋ณด๋‹ค ๋” ์œ ์—ฐflexibleํ•˜๋‹ค๊ณ  ํ•œ๋‹ค.


์ข…ํ•ฉํ•˜์ž๋ฉด, Metric Learning์€ ์ฃผ์–ด์ง„ ๋ฐ์ดํ„ฐ์…‹์—์„œ ํ•จ์ˆ˜ $D$, $f$๊ฐ€ pairwise relation ๋˜๋Š” triplet relation์„ ์ž˜ ์ถœ๋ ฅํ•˜๋„๋ก ํ•™์Šต์‹œํ‚ค๋Š” ๋ถ„์•ผ๋‹ค.



Classical Metric Learning

Deep Learning ์ด์ „์˜ Metric Learning์— ๋Œ€ํ•œ ๋ถ€๋ถ„์ด๋‹ค.

์‹ค์ œ ์—ฐ๊ตฌ์—์„œ๋Š” ์•ˆ ์“ฐ๊ฒ ์ง€๋งŒ, ๊ณ ์ „์ ์ธ ๋ฐฉ๋ฒ•์€ ์ง€๊ธˆ์˜ DL ๊ธฐ๋ฐ˜ ๋ฐฉ๋ฒ•๋“ค์„ ์ด๋Œ์–ด๋‚ด๋Š” ๋™๊ธฐ๋ฅผ ๋ถ€์—ฌํ•˜๊ณ , DL ๊ธฐ๋ฐ˜ ์ ‘๊ทผ์— ์˜๊ฐ์„ ์ค€๋‹ค.

Mahalanobis Distance

๋‘ ์ ์— ๋Œ€ํ•œ Euclidean metric $D_E$๋Š” ์•„๋ž˜์™€ ๊ฐ™์ด ์ •์˜ํ•œ๋‹ค.

\[D_E(\mathbf{x}, \mathbf{y}) = \sqrt{(\mathbf{x}-\mathbf{y})^T (\mathbf{x}-\mathbf{y})}\]

์ด๋•Œ, Mahalanobis distance $D_M$์€ ์•„๋ž˜์™€ ๊ฐ™์ด ์ •์˜ํ•œ๋‹ค.

\[D_M(\mathbf{x}, \mathbf{y}) = \sqrt{(\mathbf{x}-\mathbf{y})^T M (\mathbf{x}-\mathbf{y})}\]


์ž๋ฃŒ๋ฅผ ๋” ์ฐพ์•„๋ณด๋‹ˆ, Mahalanobis distance๋Š” multi-variate distribution์—์„œ ๊ฑฐ๋ฆฌ๋ฅผ ์žฌ๋Š” ์ข‹์€ ๋„๊ตฌ๋ผ๊ณ  ํ•œ๋‹ค.

multi-variate distribution ์ƒ์˜ ํ•œ ์ ์„ $\mathbf{x}$๋ผ๊ณ  ํ•˜๊ณ , distribution์˜ ํ‰๊ท ์„ $\mu$, ๋ถ„์‚ฐ์„ $\Sigma$๋ผ๊ณ  ํ–ˆ์„ ๋•Œ Mahalanobis distance๋Š” ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

\[D_{\Sigma}(\mathbf{x}, \mu) = \sqrt{(\mathbf{x}-\mu)^T \, {\Sigma^{-1}} \, (\mathbf{x}-\mu)}\]

ํฅ๋ฏธ๋กœ์šด ์ ์€ multi-variate normal distribution $\mathcal{N}(\mathbf{x})$์—๋„ Mahalanobis distance๊ฐ€ ๋“ฑ์žฅํ•œ๋‹ค.

\[\begin{aligned} \mathcal{N}(\mathbf{x}) &= \frac{1}{\sqrt{(2\pi)^k \lvert \Sigma \rvert}} \exp{\left( - \frac{1}{2} (\mathbf{x}-\mu)^T \, {\Sigma^{-1}} \, (\mathbf{x}-\mu) \right)} \\ &= \frac{1}{\sqrt{(2\pi)^k \lvert \Sigma \rvert}} \exp{\left( - \frac{1}{2} {\left(D_{\Sigma}\right)}^2 \right)} \end{aligned}\]


Mahalanobis distance $D_M$์—์„œ ์ฃผ๋ชฉํ•  ์ ์€ ํ–‰๋ ฌ $M$์„ ํ•™์Šต์‹œํ‚ฌ ์ˆ˜ ์žˆ๋‹คlearnable๋Š” ๊ฒƒ์ด๋‹ค!

์šฐ๋ฆฌ๋Š” ์ฃผ์–ด์ง„ ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ๋ถ€ํ„ฐ $M$์„ ํ•™์Šต์‹œํ‚ค๊ฒŒ ๋˜๋Š”๋ฐ,1 ๊ฐ€์žฅ ์ง๊ด€์ ์ด๊ณ  unsupervised ๋ฐฉ๋ฒ•์€ ๋ฐ์ดํ„ฐ์…‹์˜ covariance matrix $\Sigma$๋ฅผ ๊ตฌํ•ด ๊ทธ๊ฒƒ์˜ ์—ญํ–‰๋ ฌ์„ $M$์œผ๋กœ ์‚ผ๋Š” ๊ฒƒ์ด๋‹ค. $M={\Sigma}^{-1}$


์ˆ˜์—…์—์„œ๋Š” Mahalanobis distance์„ ๋‹ค๋ฃจ๋Š” ๋ฐฉ๋ฒ•์„ ์†Œ๊ฐœํ•˜์˜€๋‹ค.

  • A first approach to distance metric learning
  • Large Margin Nearest Neighbor(LMNN)



A first approach to distance metric learning

๋ฐ์ดํ„ฐ์…‹์—์„œ pair๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ๋‘ ์ง‘ํ•ฉ $S^{+}$, $S^{-}$๋ฅผ ๋งŒ๋“ ๋‹ค.

\[\begin{aligned} S^{+} &= \textrm{The set of similar pairs} \\ S^{-} &= \textrm{The set of disimilar pairs} \end{aligned}\]

๊ทธ๋ฆฌ๊ณ  ์•„๋ž˜์™€ ๊ฐ™์ด ์ตœ์ ํ™” ๋ฌธ์ œ๋ฅผ ๊ตฌ์„ฑํ•œ๋‹ค.

\[M^{*} = \underset{M}{\textrm{argmin}} \sum_{\left(\mathbf{x}_i, \mathbf{x}_j\right) \in S^{+}} (\mathbf{x}_i - \mathbf{x}_j)^{T} \, M \, (\mathbf{x}_i - \mathbf{x}_j)\]

๋‹น์—ฐํžˆ $S^{+}$์— ์†ํ•˜๋Š” pair $(\mathbf{x}_i, \mathbf{x}_j)$์— ๋Œ€ํ•œ ๊ฑฐ๋ฆฌ๊ฐ’์€ ์ž‘์•„์•ผ ํ•œ๋‹ค.

์ด๋•Œ, ์œ„์˜ ์ตœ์ ํ™” ๋ฌธ์ œ์—์„œ $S^{-}$๋ฅผ ๊ณ ๋ คํ•œ constraint๋ฅผ ์ถ”๊ฐ€ํ•ด์ค€๋‹ค!

\[\begin{aligned} M^{*} &= \underset{M}{\textrm{argmin}} \sum_{\left(\mathbf{x}_i, \mathbf{x}_j\right) \in S^{+}} (\mathbf{x}_i - \mathbf{x}_j)^{T} \, M \, (\mathbf{x}_i - \mathbf{x}_j) \\ & \textrm{s.t.} \; \sum_{\left(\mathbf{x}_i, \mathbf{x}_j\right) \in S^{-}} (\mathbf{x}_i - \mathbf{x}_j)^{T} \, M \, (\mathbf{x}_i - \mathbf{x}_j) \ge 1 \end{aligned}\]

์ฆ‰, $S^{-}$์— ์†ํ•˜๋Š” pair $(\mathbf{x}_i, \mathbf{x}_j)$์— ๋Œ€ํ•œ ๊ฑฐ๋ฆฌ๊ฐ’์€ ์–ด๋Š ์ •๋„ -Margin- ๋งŒํผ์€ ๋ณด์žฅ๋˜์–ด์•ผ ํ•œ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค.


์ด ๋ฐฉ์‹์œผ๋กœ ์ตœ์ ํ™” ๋ฌธ์ œ๋ฅผ ํ’€์–ด์„œ $M^{*}$์„ ๊ตฌํ–ˆ๋‹ค๋ฉด, Mahalanobis distance๋Š” ๋ฐ์ดํ„ฐ์…‹์„ ์•„๋ž˜์™€ ๊ฐ™์ด ์ž˜ ๋ถ„ํ• ํ•˜๊ฒŒ ๋œ๋‹ค.

<img src=/images/computer-science/computer-vision/mahalanobis-dist-result.png style="width:60%;">

"Distance metric learning with application to clustering with side-information", NIPS 2002


Large Margin Nearest Neighbor

Large Margin Nearest Neighbor, ์ค„์—ฌ์„œ LMNN์˜ ๊ฒฝ์šฐ ์ข€๋” ๋ณต์žกํ•œ ํ˜•ํƒœ์˜ Objective function์„ ์ฑ„์šฉํ•œ๋‹ค. ํ•œ๋ฒˆ ์‚ดํŽด๋ณด์ž!


๋จผ์ € $S^{+}$์— ๋Œ€ํ•œ ๋ถ€๋ถ„์„ ์‚ดํŽด๋ณด์ž. LMNN๋„ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ positive pair์˜ ๊ฑฐ๋ฆฌํ•ฉ์ด ์ตœ์†Œ๊ฐ€ ๋˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•œ๋‹ค.

\[\begin{aligned} M^{*} &= \underset{M}{\textrm{argmin}} \sum_{i, j} \eta_{ij} \cdot {D_M\left(\mathbf{x}_i, \mathbf{x}_j\right)}^2 \\ & \textrm{where} \\ & \eta_{ij} = \begin{cases} 1, \quad (\mathbf{x}_i, \mathbf{x}_j) \in S^{+}\\ 0, \quad (\mathbf{x}_i, \mathbf{x}_j) \in S^{-} \end{cases} \end{aligned}\]

$\eta_{ij}$๋ผ๋Š” indicator variable์„ ๋„์ž…ํ•ด positive pair์˜ ๊ฑฐ๋ฆฌํ•ฉ์„ ์ตœ์†Œํ™”ํ•˜๋„๋ก ๋””์ž์ธ ํ–ˆ๋‹ค.


์—ฌ๊ธฐ์„œ ๋๋‚˜๋Š”๊ฒŒ ์•„๋‹ˆ๋ผ $S^{-}$์— ๋Œ€ํ•œ ๋ถ€๋ถ„๋„ ๊ณ ๋ คํ•œ ํ…€์„ ์ถ”๊ฐ€ํ•ด์ค€๋‹ค. ํ•ด๋‹น ํ…€๋งŒ ๋”ฐ๋กœ ์ž‘์„ฑํ•ด๋ณด๋ฉด ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

\[\sum_{i, j, k} \eta_{ij}(1-\eta_{ij}) \cdot h\left[ 1 + {D_M\left(\mathbf{x}_i, \mathbf{x}_j\right)}^2 - {D_M\left(\mathbf{x}_i, \mathbf{x}_k\right)}^2 \right]\]

์œ„ ํ…€์€ ์„ธ ์  $\{\mathbf{x}_i, \mathbf{x}_j, \mathbf{x}_k\}$์— ๋Œ€ํ•œ Triplet relation์„ ๊ณ ๋ คํ•˜๋Š” ํ…€์œผ๋กœ $(\mathbf{x}_i, \mathbf{x}_j) \in S^{+}$์ด๊ณ , $(\mathbf{x}_i, \mathbf{x}_k) \in S^{-}$์ผ ๋•Œ๋ฅผ ๊ณ ๋ คํ•œ๋‹ค.

์ด๋•Œ, $h\left[ 1 + {D_M\left(\mathbf{x}_i, \mathbf{x}_j\right)}^2 - {D_M\left(\mathbf{x}_i, \mathbf{x}_k\right)}^2 \right]$๋Š” hinge function์œผ๋กœ ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

\[h\left[ 1 + {D_M\left(\mathbf{x}_i, \mathbf{x}_j\right)}^2 - {D_M\left(\mathbf{x}_i, \mathbf{x}_k\right)}^2 \right] \\ = \max \left(0, 1 + {D_M\left(\mathbf{x}_i, \mathbf{x}_j\right)}^2 - {D_M\left(\mathbf{x}_i, \mathbf{x}_k\right)}^2 \right)\]

์ด hinge function์˜ ๊ฐ’์„ ์ตœ์†Œํ™”ํ•˜๋ ค๋ฉด $1 + {D_M\left(\mathbf{x}_i, \mathbf{x}_j\right)}^2 - {D_M\left(\mathbf{x}_i, \mathbf{x}_k\right)}^2 \le 0$ ์ด ๋˜์–ด์•ผ ํ•œ๋‹ค. ๊ทธ๋ž˜์•ผ hinge function์˜ ๊ฐ’์ด 0์ด ๋˜๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

์ด๊ฒƒ์„ ๋‹ค์‹œ ์“ฐ๋ฉด

\[\begin{aligned} 1 + {D_M\left(\mathbf{x}_i, \mathbf{x}_j\right)}^2 - {D_M\left(\mathbf{x}_i, \mathbf{x}_k\right)}^2 & \le 0 \\ 1 + {D_M\left(\mathbf{x}_i, \mathbf{x}_j\right)}^2 & \le {D_M\left(\mathbf{x}_i, \mathbf{x}_k\right)}^2 \end{aligned}\]

์˜ ์กฐ๊ฑด์„ ๋งŒ์กฑํ•ด์•ผ ํ•˜๋Š” ๊ฒƒ์ด ๋œ๋‹ค.

์ด๊ฒƒ์€ negative-pair dist๊ฐ€ positive-pair dist๋ณด๋‹ค Margin $1$ ๋งŒํผ ๋” ๋ฉ€๋ฆฌ์žˆ๋„๋ก ๋งŒ๋“ ๋‹ค. ์ฆ‰,

\[\begin{aligned} 1 + {D_M\left(\mathbf{x}_i, \mathbf{x}_j\right)}^2 & \le {D_M\left(\mathbf{x}_i, \mathbf{x}_k\right)}^2 \\ \textrm{Margin} + \left(\textrm{positive-pair dist}\right)^2 & \le \left(\textrm{negative-pair dist}\right)^2 \end{aligned}\]

์ด์ œ Pairwise ํ…€๊ณผ Triplet ํ…€์„ ์ข…ํ•ฉํ•˜๋ฉด ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

\[M^{*} = \underset{M}{\textrm{argmin}} \left\{ \sum_{i, j} \eta_{ij} \cdot {D_M\left(\mathbf{x}_i, \mathbf{x}_j\right)}^2 \right\} + c \left\{ \sum_{i, j, k} \eta_{ij}(1-\eta_{ij}) \cdot h\left[ 1 + {D_M\left(\mathbf{x}_i, \mathbf{x}_j\right)}^2 - {D_M\left(\mathbf{x}_i, \mathbf{x}_k\right)}^2 \right] \right\}\]

์ฆ‰,

1. pair $(\mathbf{x}_i, \mathbf{x}_j)$๊ฐ€ positive-pair๋ผ๋ฉด, ๋‘ ์ ์˜ ๊ฑฐ๋ฆฌ๊ฐ’์„ ์ค„์—ฌ ์„œ๋กœ ๊ฐ€๊นŒ์›Œ์ง€๋„๋ก ๋Œ์–ด๋‹น๊ธด๋‹ค; Pull

2. pair $(\mathbf{x}_i, \mathbf{x}_k)$๊ฐ€ negative-pair๋ผ๋ฉด, ๋‘ ์ ์˜ ๊ฑฐ๋ฆฌ๊ฐ’์„ positive-pair์˜ ๊ฐ€์žฅ ํฐ ๊ฑฐ๋ฆฌ๊ฐ’๋ณด๋‹ค 1 ๋งŒํผ์˜ ์—ฌ์œ Margin๋ฅผ ๋‘๊ณ  ๋ฉ€์–ด์ง€๋„๋ก ๋ฐ€์–ด๋‚ธ๋‹ค; Push


์ฒ˜์Œ์˜ ์‹œ๋„์™€ ๋น„๊ตํ–ˆ์„ ๋•Œ, LMNN๋Š” ์ข€๋” ๋™์ ์ด๋ผ๊ณ  ๋งํ•  ์ˆ˜ ์žˆ๋‹ค.

์•ž์˜ ์‹œ๋„์—์„  negative-pair์— ๋Œ€ํ•ด ๊ฑฐ๋ฆฌ๊ฐ’์ด $1$์ด๋ผ๋Š” ์ง€์ •๋œ ๊ฐ’๋ณด๋‹ค ํฌ๊ธฐ๋งŒ ํ•˜๋ฉด ์ถฉ๋ถ„ํ–ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ด๋Ÿฐ ์ ‘๊ทผ์€ ํ—ˆ์ ์ด ์žˆ๋Š”๋ฐ, positive-pair์˜ ๊ฑฐ๋ฆฌ๊ฐ’์ด ๋„์ €ํžˆ 1๋ณด๋‹ค ์ขํ˜€์ง€์ง€ ์•Š์„ ์ˆ˜๋„ ์žˆ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค; positive-pair์— ๋Œ€ํ•ด์„  constraint๋ฅผ ๋งŒ์กฑํ•˜๋ฉด์„œ ๊ฑฐ๋ฆฌํ•ฉ์ด ์ค„์–ด๋“ค๊ธฐ๋งŒ ํ•˜๋ฉด ๋œ๋‹ค๋Š” ์ ์„ ์ƒ๊ธฐํ•˜๋ผ.

๊ทธ๋ž˜์„œ negative-pair๊ฐ€ positive-pair๋ณด๋‹ค ๋” ๊ฐ€๊นŒ์ด ์œ„์น˜ํ•˜๋Š” ์ƒํ™ฉ์ด ์ถฉ๋ถ„ํžˆ ๊ฐ€๋Šฅํ•˜๋‹ค๋Š” ์ ์ด ํ—ˆ์ ์œผ๋กœ ์ง€์ ๋ฐ›๋Š”๋‹ค.


LMNN์—์„  ์ด๊ฒƒ์„ ๊ทน๋ณตํ•ด negative-pair๊ฐ€ positive-pair๋ณด๋‹ค Margin $1$๋งŒํผ ๋–จ์–ด์ง€๋„๋ก ์ตœ์ ํ™”ํ•œ๋‹ค.

์ด๋•Œ, negative-pair์˜ ๊ฑฐ๋ฆฌ๊ฐ’์˜ ๊ธฐ์ค€์ด ๋˜๋Š” positive-pair์˜ ๊ฑฐ๋ฆฌ๊ฐ’์ด ๊ณ ์ •๋œ ๊ฒƒ์ด ์•„๋‹ˆ๋ผ ๋™์ ์œผ๋กœ ๋ณ€ํ•˜๊ธฐ ๋•Œ๋ฌธ์— LMNN์€ ๋” ๋™์ ์œผ๋กœ ์ž‘๋™ํ•œ๋‹ค๊ณ  ํ‰๊ฐ€ํ•œ๋‹ค.



Metric Learning + DL

\[D_M (\mathbf{x}_i, \mathbf{x}_j) = \sqrt{(\mathbf{x}_i - \mathbf{x}_j)^T M (\mathbf{x}_i - \mathbf{x}_j)}\]

๊ณ ์ „์ ์ธ Metric Learning์—์„  Mahalanobis distance์˜ $M$ ๊ฐ’์„ ๊ตฌํ•˜๋Š” ์ตœ์ ํ™” ๋ฌธ์ œ์— ์ง‘์ค‘ํ–ˆ๋‹ค.

\[D\left(f(\mathbf{x}_i), f(\mathbf{x}_j)\right)\]

๊ณ ์ „์ ์ธ ๋ฐฉ๋ฒ•์—์„œ๋„ feataure extractor $f$๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ๋Š” ํ–ˆ์ง€๋งŒ, ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ๋ฅผ ๊ทธ๋Œ€๋กœ ์‚ฌ์šฉํ•˜๊ฑฐ๋‚˜, ์ง์ ‘ ๋””์ž์ธํ•œ Image descriptor๋ฅผ ์‚ฌ์šฉํ–ˆ๋‹ค.


Metric Learning์—์„œ DL์ด ๋„์ž…๋˜๊ณ ๋ถ€ํ„ฐ๋Š” distance metric์„ ํ•™์Šตํ•˜๋Š” ๊ฒŒ ์•„๋‹ˆ๋ผ, feature extractor $f$๋ฅผ ํ•™์Šต์‹œํ‚ค๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ๋ฐœ์ „ํ•ด์™”๋‹ค.

๋‹ค์Œ ํฌ์ŠคํŠธ์—์„  DL์„ ๋ฐ”ํƒ•์œผ๋กœ ํ•˜๋Š” Metric Learning์— ๋Œ€ํ•ด ์ •๋ฆฌํ•œ๋‹ค.


  1. ์œ ์˜ํ•  ์ ์€ $M$๋Š” positive semi-definite matrix์—ฌ์•ผ ํ•œ๋‹ค๋Š” ์ ์ด๋‹ค. ์ด ์กฐ๊ฑด์„ ๋งŒ์กฑํ•˜์ง€ ์•Š๋Š”๋‹ค๋ฉด, ๋ณต์†Œ์ˆ˜์ธ Mahalanobis dist๋ฅผ ์–ป๋Š”๋‹คโ€ฆย