์ด ํฌ์ŠคํŠธ๋Š” ์ œ๊ฐ€ ๊ฐœ์ธ์ ์ธ ์šฉ๋„๋กœ ์ •๋ฆฌํ•œ ๊ธ€ ์ž…๋‹ˆ๋‹ค.

12 minute read

์ด ํฌ์ŠคํŠธ๋Š” ์ œ๊ฐ€ ๊ฐœ์ธ์ ์ธ ์šฉ๋„๋กœ ์ •๋ฆฌํ•œ ๊ธ€ ์ž…๋‹ˆ๋‹ค.



SphereNet: Learning Spherical Representations for Detection and Classification in Omnidirectional Images (ECCV 2018) pdf

  • Unofficial github: ๊ณต์‹ ๊ตฌํ˜„์€ ์•„๋‹ˆ๊ณ  ๋‹ค๋ฅธ ์‚ฌ๋žŒ์ด ์ง์ ‘ ๊ตฌํ˜„ํ•œ ๋ชจ๋ธ์ด๋‹ค.
    • SphereConv2D & SphereMaxPool2D
      • Kernel Pattern Look-up Table
    • Uniform sampling
    • OmniMNIST
    • Spherical Object Detection

Kernel Sampling Method

โ€ƒ โ€œThe central idea of SphereNet is to lift local CNN operations from the regular image domain to the sphere surface where omnidirectional images can be represented without distortions.โ€

โ€ƒ โ€œThis is achieved by representing the kernel as a small patch tangent to the spehre.โ€

  • $S$: unit sphere
  • $S^2$: its surface
  • $\mathbf{s} = (\phi, \theta) \in S^2$
    • ์  $\mathbf{s}$๊ฐ€ $S^2$ ์œ„์— ์žˆ์–ด, ์ขŒํ‘œ๊ฐ€ ์œ„๋„/๊ฒฝ๋„๋กœ ํ‘œํ˜„๋œ๋‹ค๋Š” ๋ง์€ ๊ณง โ€œequirectangular image ์œ„์— ์žˆ๋‹คโ€๋Š” ๋ง์ด๋‹ค.
  • $\Pi$: tangent plane located at \(\mathbf{s}_{\Pi} = (\phi_{\Pi}, \theta_{\Pi})\)
    • $\mathbf{x}$ : a point on $\Pi$ by its coordinates $\mathbf{x} \in \mathbb{R}^2$
    • $\Pi_0$: tangent plane located at $\mathbf{s} = (0 ,0)$.

โ€ƒ โ€œA point $s$ on the sphere is related to its tangent plane coordinates $\mathbf{x}$ via a gnomonic projection.โ€

($\mathbf{s}$: point on sphere) $\equiv$ ($\mathbf{x}$ : coordinate on tangent plane)

Sampling at the center

Equirectangular์˜ ์ค‘์‹ฌ์—์„œ ์ƒ˜ํ”Œ๋ง = step size $\Delta_{\theta}$, $\Delta_{\phi}$ ๋งŒํผ sampling ํ•จ.

\[\begin{aligned} \mathbf{s}_{(0, 0)} &= (0, 0) \\ \mathbf{s}_{(\pm 1, 0)} &= (\pm \Delta_{\theta}, 0) \\ \mathbf{s}_{(0, \pm 1)} &= (0, \pm \Delta_{\phi}) \\ \mathbf{s}_{(\pm 1, \pm 1)} &= (\pm \Delta_{\theta}, \pm \Delta_{\phi}) \end{aligned}\]
์‹ค์ œ ๋…ผ๋ฌธ์—์„œ์˜ notation
\[\begin{aligned} \mathbf{s}_{(0, 0)} &= (0, 0) \\ \mathbf{s}_{(\pm 1, 0)} &= (\pm \Delta_{\phi}, 0) \\ \mathbf{s}_{(0, \pm 1)} &= (0, \pm \Delta_{\theta}) \\ \mathbf{s}_{(\pm 1, \pm 1)} &= (\pm \Delta_{\phi}, \pm \Delta_{\theta}) \end{aligned}\]

์•„๋ž˜์˜ ๊ณต์‹๋“ค๊ณผ notation์ด ์•ฝ๊ฐ„ ์•ˆ ๋งž์•„์„œ, ํฌ์ŠคํŠธ์˜ ๋ฐฉ์‹์œผ๋กœ $\phi$์™€ $\theta$ ์ˆœ์„œ๋ฅผ ๋ฐ”๊ฟจ๋‹ค.


์ด๋•Œ, ๊ฐ sampling point๋“ค์ด tangent plane $\Pi_0$์˜ ์–ด๋””์— ์žˆ๋Š”์ง€ ๊ณ„์‚ฐํ•  ์ˆ˜ ์žˆ๋‹ค. gnomonic projection์„ ์ด์šฉํ•ด ๊ณ„์‚ฐํ•œ ๊ฒฐ๊ณผ๋Š” ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

$$ \begin{aligned} x(\theta, \phi) &= \frac{\cos \phi \sin (\theta - \theta_{\Pi_0})}{\sin \phi_{\Pi_0} \sin \phi + \cos \phi_{\Pi_0}\cos \phi \cos (\theta - \theta_{\Pi_0})} \\ \\ y(\theta, \phi) &= \frac{\cos \phi_{\Pi_0} \sin \phi - \sin \phi_{\Pi_0}\cos \phi \cos (\theta - \theta_{\Pi_0})}{\sin \phi_{\Pi_0} \sin \phi + \cos \phi_{\Pi_0}\cos \phi \cos (\theta - \theta_{\Pi_0})} \end{aligned} $$


๊ทธ๋ž˜์„œ Sampling pattern \(\mathbf{s}_{(j, k)}\)๋Š” ๊ณง, ์•„๋ž˜์™€ ๊ฐ™์€ kernel pattern $\mathbf{x}_{(j, k)}$๋ฅผ ์œ ๋„ํ•œ๋‹ค.

\[\begin{aligned} \mathbf{x}_{(0, 0)} &= (0, 0) \\ \mathbf{x}_{(\pm 1, 0)} &= (\pm \tan \Delta_{\theta}, 0) \\ \mathbf{x}_{(0, \pm 1)} &= (0, \pm \tan \Delta_{\phi}) \\ \mathbf{x}_{(\pm 1, \pm 1)} &= (\pm \tan \Delta_{\theta}, \pm \sec \Delta_{\theta} \tan \Delta_{\phi}) \end{aligned}\]
๊ณต์‹ ์œ ๋„

์œ„์˜ ๊ณต์‹์ด ์ž˜ ์™€๋‹ฟ์ง€ ์•Š์•„์„œ ์ง์ ‘ ์œ ๋„ํ•ด๋ณด๋ ค๊ณ  ํ•œ๋‹ค.

๋จผ์ € $\mathbf{s}_{(0, 0)} = (0, 0)$์ธ ๊ฒฝ์šฐ๋ฅผ ์‚ดํŽด๋ณด์ž.

์ด๋•Œ $(\theta_{\Pi_0}\, , \phi_{\Pi_0}) = (0, 0)$์ด๋‹ค.

์ด์ œ ๊ณต์‹์— ๋Œ€์ž…ํ•ด๋ณด์ž.

\[\begin{aligned} x(0, 0) &= \frac{\cos 0\sin (0 - 0)}{\sin 0 \sin 0+ \cos 0\cos 0\cos (0- 0)} = 0\\ \\ y(0, 0) &= \frac{\cos 0 \sin 0- \sin 0\cos 0\cos (0- 0)}{\sin 0 \sin 0+ \cos 0\cos 0\cos (0- 0)} = 0 \end{aligned}\]

๊ทธ๋Ÿผ ๋ถ„์ž์˜ ํ…€์ด ๋ชจ๋‘ 0์ด ๋˜๊ธฐ ๋•Œ๋ฌธ์—, ์œ„์—์„œ ์œ ๋„ํ•œ $\mathbf{x}_{(0, 0)} = (0, 0)$์˜ ๊ฒฐ๊ณผ๊ฐ€ ๋‚˜์˜จ๋‹ค.

์ผ๋‹จ ๋‹ค์Œ ๊ณผ์ •์„ ์ง„ํ–‰ํ•˜๊ธฐ ์ „์— $(\phi_{\Pi_0}\, , \theta_{\Pi_0}) = (0, 0)$์— ๋งž์ถฐ์„œ ๊ณต์‹์„ refine ํ•˜์ž.

\[\begin{aligned} x(\theta, \phi) &= \frac{\cos \phi \sin (\theta - 0)}{\sin 0 \sin \phi + \cos 0\cos \phi \cos (\theta - 0)} \\ &= \frac{\cos \phi \sin \theta}{\cos \phi \cos \theta} = \frac{\sin \theta}{\cos \theta} = \tan \theta \\ \\ y(\theta, \phi) &= \frac{\cos 0 \sin \phi - \sin 0\cos \phi \cos (\theta - 0)}{\sin 0 \sin \phi + \cos 0\cos \phi \cos (\theta - 0)} \\ &= \frac{\sin \phi}{\cos \phi \cos \theta} = \frac{\tan \phi}{\cos \theta} \end{aligned}\]

์ด๋ฒˆ์—๋Š” \(\mathbf{s}_{(\pm 1, 0)} = (\pm \Delta_{\theta}, 0)\)์˜ ๊ฒฝ์šฐ๋ฅผ ์‚ดํŽด๋ณด์ž.

์ด๋•Œ, $(\theta_{\Pi_0}\, , \phi_{\Pi_0}) = (0, 0)$์ด๋‹ค.

๊ณต์‹์— ๋Œ€์ž…ํ•˜๋ฉด ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

\[\begin{aligned} x(\pm \Delta_{\theta}, 0) &= \frac{\sin \theta}{\cos \theta} = \frac{\sin (\pm \Delta_{\theta})}{\cos (\pm \Delta_{\theta})} = \pm \tan \Delta_{\theta}\\ \\ y(\pm \Delta_{\theta}, 0) &= \frac{\tan 0}{\cos \pm \Delta_{\theta}} = 0 \end{aligned}\]

๊ทธ๋ž˜์„œ ์œ„์—์„œ ์œ ๋„ํ•œ \(\mathbf{x}_{(\pm 1, 0)} = (\pm \tan \Delta_{\theta}, 0)\)์˜ ๊ฒฐ๊ณผ๊ฐ€ ๋‚˜์˜จ๋‹ค.


Inverse gnomonic projection

์•ž์—์„œ๋Š” kernel์˜ ์ค‘์‹ฌ์„ \(\mathbf{s}_{\Pi_0} = (0, 0)\)์œผ๋กœ ์žก์•˜๋‹ค๋ฉด, ์ด๋ฒˆ์—๋Š” \(\mathbf{s}_{\Pi} = (\theta_{\Pi}, \phi{\Pi})\)๋กœ ์žก์ž.

์ด๋•Œ, kernel paattern $\mathbf{x} = (x, y)$์—์„œ sampling pattern $\mathbf{s} = (\theta, \phi)$๋Š” ์•„๋ž˜์™€ ๊ฐ™์ด ์œ ๋„ํ•  ์ˆ˜ ์žˆ๋‹ค.

$$ \begin{aligned} \theta(x, y) &= \theta_{\Pi} + \tan^{-1} {\left( \frac{x \sin \nu}{\rho \cos \phi_{\Pi} \cos \nu - y \sin \phi_{\Pi} \sin \nu} \right)} \\ \phi(x, y) &= \sin^{-1} {\left( \cos \nu \sin \phi_{\Pi} + \frac{y \sin \nu \cos \phi_{\Pi}}{\rho} \right)} \\ \textrm{where} \quad & \rho = \sqrt{x^2 + y^2} \quad \textrm{and} \quad \nu = \tan^{-1} \rho \end{aligned} $$

์ˆ˜์‹์ด ๋งŽ์ด ๋ณต์žกํ•œ๋ฐ, ๊ฐ„๋‹จํ•œ ์ผ€์ด์Šค์ธ $\mathbf{s}_{\Pi_0} = (0, 0)$์—์„œ ์‚ดํŽด๋ณด์ž.

$$ \begin{aligned} \theta(x, y) &= 0 + \tan^{-1} {\left( \frac{x \sin \nu}{\rho \cos 0 \cos \nu - y \sin 0 \sin \nu} \right)} \\ &= \tan^{-1} {\left( \frac{x \sin \nu}{\rho \cos \nu} \right)} \\ \phi(x, y) &= \sin^{-1} {\left( \cos \nu \sin 0 + \frac{y \sin \nu \cos 0}{\rho} \right)} \\ &= \sin^{-1} {\left( \frac{y \sin \nu}{\rho} \right)} \\ \textrm{where} \quad & \rho = \sqrt{x^2 + y^2} \quad \textrm{and} \quad \nu = \tan^{-1} \rho \end{aligned} $$

์‚ฌ์‹ค ์ด ๊ณต์‹์€ ์•ž์—์„œ ์‚ดํŽด๋ณธ, sampling pattern $\mathbf{s} = (\theta, \phi)$์—์„œ kernel paattern $\mathbf{x} = (x, y)$๋ฅผ ์œ ๋„ํ•˜๋Š” ๊ณต์‹์˜ ์—ญํ•จ์ˆ˜๋‹ค.


  "Several recent works also consider adapting the sampling locations of convolutional networks, ... Unlike our work, these methods need to learn the sampling locations during training, ... In contrast, we take advantage of the geometric properties of the camera to inject this knowledge explicitly into the network architecture."

  "it is strightforward to allow a filter to sample data across the image boundary. This eliminates any discontinuities ... and improves recognition of objects (which are spllit at the sides of an equirectangular image representation) or (which are positioned very close to the poles)."
  "In our experimental evaluation, we demonstrate how an (object detector trained on perspective images) can be successfully applied to the omnidirectional case."

Implementation

  "Implementation: As the sampling locationas are fixed according to the geometry of the spherical image representation, they can be precomputed for each kernel location at every layer of the network."
  "it is sufficient to calculate and store the sampling locations once per row and then translate them. We store the sampling locations in look-up tables."

Equirectangular Image์—์„œ row์— ๋Œ€ํ•œ ์ •๋ณด๊ฐ€ ๊ฐ™๋‹ค๋ฉด ๋ชจ๋‘ ๋™์ผํ•œ kernel sampling deviation์„ ๊ฐ€์ง„ kernel์„ ์‚ฌ์šฉํ•œ๋‹ค. ๊ทธ๋ž˜์„œ ์ด๊ฒƒ์„ row์— ๋”ฐ๋ผ ๊ณ„์‚ฐํ•˜์—ฌ look-up table์— ์ €์žฅํ•œ ํ›„ ๊บผ๋‚ด ์“ด๋‹ค๋Š” ๋ง์ด๋‹ค.



Experiment

Spherical Image Classification

Conv layer์™€ Pool layer๋ฅผ SphereConv์™€ SpherePool๋กœ ๊ต์ฒดํ•ด์ฃผ๋ฉด ๋œ๋‹ค.

ํ‰-๋ฒ”

Spherical Object Detection

Spherical Single Shot MultiBox Detector(Sphere-SSD)๋ฅผ ์ œ์‹œํ•œ๋‹ค.

  "in contrast to the original SSD, anchor boxes are now placed on tangent planes of the sphere and are defined in terms of spherical coordinates of their respective tangent plane."
  "In order to match anchor boxes to ground-truth detections, we select the anchor box closest to each ground-truth box. During inference, we perform NMS. For evaluation, we use the IoU of (two polygonal regions) which are constructed from the (gnomonic projections of evenly spaced points along the rectangular BBox on the tangent plane)."

ํฅ๋ฏธ๋กœ์šด ์ ์€ IoU๋ฅผ ๋น„๊ตํ•˜๊ธฐ ์œ„ํ•ด ๋‘ polygonal region์„ ๋™์ผํ•œ tangent plane์— ๋งคํ•‘ํ•˜์—ฌ ๊ทธ ์ƒํƒœ์—์„œ IoU๋ฅผ ๊ตฌํ–ˆ๋‹ค๋Š” ์ ์ด๋‹ค. Equirectangular Image์—์„œ IoU๋ฅผ ๊ตฌํ•˜์ง€ ์•Š์€ ์ ์ด ํฅ๋ฏธ๋กญ๋‹ค.



Classification: Omni-MNIST

์ƒ-๋žต

Object Detection: FlyingCars

360 ์ด๋ฏธ์ง€์— ๋Œ€ํ•œ Dataset์ด ๋ถ€์กฑํ•˜์—ฌ ๋…ผ๋ฌธ์—์„œ ์ž์ฒด์ ์œผ๋กœ FlyingCars Dataset์„ ๋งŒ๋“ค์—ˆ๋‹ค๊ณ  ํ•œ๋‹ค.

๋‹จ์ˆœํ•˜๊ฒŒ 360 ์ด๋ฏธ์ง€์— 3D car model์„ ๋ถ™์ธ ํ˜•์‹์ด๋ผ๊ณ  ํ•œ๋‹ค.

๊ธฐ์กด 360 ์ด๋ฏธ์ง€ Detection ๋ฐฉ๋ฒ•๋ณด๋‹ค ๊ฐœ์„ ๋œ ๊ฒฐ๊ณผ๊ฐ€ ๋„์ถœ๋˜์—ˆ๋‹ค.

์ฃผ๋ชฉํ•  ์ ์€ Detection๋œ ์ด๋ฏธ์ง€๋ฅผ ์‚ดํŽด๋ณด๋ฉด, Equirectangular์˜ Discontinuity ๋ฌธ์ œ๋ฅผ ์™„๋ฒฝํ•˜๊ฒŒ ํ•ด๊ฒฐํ•˜๊ณ  ์žˆ๋‹ค๋Š” ์ ์ด๋‹ค!!

Transfer Learning: OmPaCa

๊ธฐ์กด์˜ perspective dataset์—์„œ ํ•™์Šต๋œ ๋ชจ๋ธ์— Spherial Layer๋ฅผ ์ ์šฉํ•œ ์‹คํ—˜์ด๋‹ค.

๋…ผ๋ฌธ์—์„  ์‹คํ—˜์„ ์œ„ํ•ด Omnidirectional Parked Cars(OmPaCa) Dataset์„ ์ƒˆ๋กญ๊ฒŒ ์ œ์‹œํ•œ๋‹ค.

KITTI Dataset1์—์„œ ํ•™์Šต์‹œํ‚จ perspective SSD model์„ Sphere-SSD ๋ชจ๋ธ๋กœ ๋ณ€ํ™˜ํ•˜์—ฌ fine-tune ํ•˜์˜€๋‹ค.


  1. UC Berkeley์—์„œ ๊ณต๊ฐœํ•œ โ€œ์ž์œจ์ฃผํ–‰์šฉ ํ•™์Šต ๋ฐ์ดํ„ฐโ€์ด๋‹ค.ย