2021-1ํ•™๊ธฐ, ๋Œ€ํ•™์—์„œ โ€˜ํ†ต๊ณ„์  ๋ฐ์ดํ„ฐ๋งˆ์ด๋‹โ€™ ์ˆ˜์—…์„ ๋“ฃ๊ณ  ๊ณต๋ถ€ํ•œ ๋ฐ”๋ฅผ ์ •๋ฆฌํ•œ ๊ธ€์ž…๋‹ˆ๋‹ค. ์ง€์ ์€ ์–ธ์ œ๋‚˜ ํ™˜์˜์ž…๋‹ˆ๋‹ค :)

5 minute read

2021-1ํ•™๊ธฐ, ๋Œ€ํ•™์—์„œ โ€˜ํ†ต๊ณ„์  ๋ฐ์ดํ„ฐ๋งˆ์ด๋‹โ€™ ์ˆ˜์—…์„ ๋“ฃ๊ณ  ๊ณต๋ถ€ํ•œ ๋ฐ”๋ฅผ ์ •๋ฆฌํ•œ ๊ธ€์ž…๋‹ˆ๋‹ค. ์ง€์ ์€ ์–ธ์ œ๋‚˜ ํ™˜์˜์ž…๋‹ˆ๋‹ค :)

โ€œClustering refers to a braod set of techniques for partitioning a dataset into homogeneous subgroups called clusters.โ€

โœจ How can we define the โ€œhomogeneityโ€?

์ด๋ฒˆ ํฌ์ŠคํŠธ์—์„œ ์šฐ๋ฆฌ๋Š” โ€œclusteringโ€์„ ์ˆ˜ํ–‰ํ•˜๋Š” 3๊ฐ€์ง€ ๋ฐฉ์‹์— ๋Œ€ํ•ด ์‚ดํŽด๋ณผ ๊ฒƒ์ด๋‹ค.

  1. Distance-based clustering
  2. Hierarchical clustering
  3. Model-based clustering

K-means clustering

Algorithm for partitiong a dataset into $k$ subgroups.

์ด๋•Œ, number of cluster $K$๋Š” ๋ฏธ๋ฆฌ ์ •ํ•ด์ ธ์•ผ ํ•จ! (HOW?)


<K-means>์—์„œ ์šฐ๋ฆฌ๋Š” <within-cluster variation measure>๋ฅผ ์ •์˜ํ•ด ์‚ฌ์šฉํ•œ๋‹ค!!

For a set $C \subset \{ 1, 2, \dots, n\}$, the within-cluster variation measure $W(C)$ is

\[W(C) = \frac{1}{\left| C \right|} \sum_{\begin{aligned} i, j &\in C \\ i &< j \end{aligned}} \left\| x_i - x_j \right\|_2^2\]

โ€ป small value of $W(C)$ may represent a good cluster.

์ฆ‰, <within-cluster variation>์€ cluster ๋‚ด์—์„œ์˜ variance๋ฅผ ์ธก์ •ํ•˜๋Š” ์ง€ํ‘œ์ธ ๊ฒƒ์ด๋‹ค!


์šฐ๋ฆฌ๊ฐ€ ๋ฐ์ดํ„ฐ๋ฅผ $C_1, \dots, C_K$์˜ ์ง‘ํ•ฉ์œผ๋กœ ๋ถ„ํ• ํ•œ๋‹ค๋ฉด, ์ด๊ฒƒ์€ ์•„๋ž˜์˜ ์ตœ์ ํ™” ๋ฌธ์ œ๋ฅผ ํ‘ธ๋Š” ๊ฒƒ๊ณผ ๊ฐ™๋‹ค.

\[\underset{C_1, \dots, C_K}{\text{minimize}} \left\{ \sum_{k=1}^K W(C_k) \right\}\]

Algorithm. Brute Force

๋‚˜์ด๋ธŒํ•˜๊ฒŒ ์ ‘๊ทผํ•ด๋ณด์ž๋ฉด, $n$๊ฐœ ๋ฐ์ดํ„ฐ๋ฅผ $K$๊ฐœ ๋ถ€๋ถ„์ง‘ํ•ฉ์œผ๋กœ ๋ถ„ํ• ํ•˜๋Š” ๊ฐ€์ง“์ˆ˜๋Š” $K^n$์ด๋‹ค. ์ฆ‰, ๋‚˜์ด๋ธŒํ•œ ์ ‘๊ทผ์€ exponential complexity๊ฐ€ ๊ฑธ๋ฆฌ๋Š” ์•„์ฃผ ์–ด๋ ค์šด ๋ฌธ์ œ๋‹ค ใ… ใ… 

๊ทธ๋ž˜์„œ Brute Force์™€ ๊ฐ™์€ ๋ฐฉ์‹์ด ์•„๋‹ˆ๋ผ, ์œ„ ๋ฌธ์ œ์˜ sub-optimal solution์„ ์ฐพ๋Š” ๋ฐฉ์‹์œผ๋กœ <K-means clustering>์„ ์ˆ˜ํ–‰ํ•œ๋‹ค!

Algorithm. K-means Algorithm

1. Randomly partition $\{1, \dots, n \}$ into $K$ clusters.

2. Repeat the following until converges.
โ€ƒโ€ƒ - For each cluster, compute the cluster centroid.
โ€ƒโ€ƒ - Assign each obs to the cluster whose centroid is the closest.

figure from ISLR

๊ทธ๋ฆผ์„ ๋ณด๋ฉด, 2๋ฒˆ๋งŒ์— ์ˆ˜๋ ดํ•œ ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค!!

โœจ NOTE: <K-means algorithm> is guaranteed to decrease $W(C)$ because

\[W(C_k) = \frac{1}{\left| C \right|} \sum_{\begin{aligned} i, j &\in C \\ i &< j \end{aligned}} \left\| x_i - x_j \right\|_2^2 = \sum_{i \in C_k} \left\| x_i - \bar{x}_k \right\|_2^2\]

where $\bar{x}_k$ is mean of cluster $C_k$.

์‹ ์œ ๋„

์ถ”ํ›„์— ์—…๋ฐ์ดํŠธ!

<K-means algorithm>์€ global optimum์ด ์•„๋‹ˆ๋ผ local optimum์„ ์ฐพ๋Š” ๊ฒƒ์œผ๋กœ ์•Œ๋ ค์ ธ ์žˆ๋‹ค.

<K-means algorithm>์˜ ๊ฒฐ๊ณผ๋Š” initial cluster์— ๋”ฐ๋ผ ๋ฐ”๋€Œ๊ธฐ ๋•Œ๋ฌธ์—, ์„œ๋กœ ๋‹ค๋ฅธ ๋ช‡๊ฐ€์ง€ initial cluster๋กœ ์—ฌ๋ ค๋ฒˆ ๋Œ๋ ค๋ณด๋Š” ๊ฒƒ์ด ๊ถŒ์žฅ๋œ๋‹ค.

figure from ISLR

๊ทธ๋ฆผ์„ ๋ณด๋ฉด, ํ•˜๋‚˜์˜ ํ˜•ํƒœ๋กœ cluster๊ฐ€ ๊ฒฐ์ •๋˜๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋‹ค.
์œ„ ๊ทธ๋ฆผ์—์„œ๋Š” 2, 3, 4, 5๋ฒˆ์งธ์˜ ๊ฒฝ์šฐ๋กœ clustering ๋˜๋Š” ๊ฒƒ์ด ์šฐ์„ธํ•˜๋‹ค.


Hierarchical Clustering

<Hierarchical Clustering>์—์„œ๋Š” โ€œdendrogramโ€์ด๋ผ๋Š” ๊ฒƒ์„ ํ†ตํ•ด cluster๋ฅผ ๋‚˜๋ˆ„๊ณ , ๋˜ ๊ทธ ๊ณ„์ธต์„ ๋ถ„์„ํ•œ๋‹ค.

figure from ISLR

์œ„ ๊ทธ๋ฆผ์„ ๋ฐ”ํƒ•์œผ๋กœ ์–ด๋–ป๊ฒŒ <Hierarchical Clustering>์„ ์ˆ˜ํ–‰ํ•˜๊ณ , โ€œdendrogramโ€์„ ๋งŒ๋“œ๋Š”์ง€ ์‚ดํŽด๋ณด์ž.

๋จผ์ €, ์œ„์˜ 2์ฐจ์› ๋ฐ์ดํ„ฐ ๋ถ„ํฌ์—์„œ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ๋‘ ์ ์„ ์ฐพ๋Š”๋‹ค. ์—ฌ๊ธฐ์„œ๋Š” (5, 7)์ด๋‹ค. ๋‘ ์ ์„ ์™ผ์ชฝ์˜ dendrogram์— ํ‘œ์‹œํ•˜๊ณ , ๋‘ ์ ์„ ํ•˜๋‚˜์˜ cluster๋กœ ๋ฌถ๋Š”๋‹ค. cluster ํ•˜๋‚˜๊ฐ€ ์ถ”๊ฐ€๋œ ๋ฐ์ดํ„ฐ ๋ถ„ํฌ์—์„œ ๋˜ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ๋‘ ์ ์„ ์ฐพ๋Š”๋‹ค. ์—ฌ๊ธฐ์„œ๋Š” (1, 6)์ด๋‹ค. ์ด ๋‘ ์ ์„ ์™ผ์ชฝ์˜ dendrogram์— ํ‘œ์‹œํ•˜๊ณ , ๋˜ ํ•˜๋‚˜์˜ cluster๋กœ ๋ฌถ๋Š”๋‹ค. ๋˜ ์ง„ํ–‰ํ•˜๋ฉด, ์ด๋ฒˆ์—๋Š” ((5, 7), 8)์ด ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ์Œ์ด๋‹ค. ์ด๊ฒƒ์„ ์™ผ์ชฝ์˜ dendrogram์— ํ‘œ์‹œํ•œ๋‹ค. โ€ฆ (continue) โ€ฆ

๋ฐฉ๋ฒ•์€ ์ •๋ง ๊ฐ„๋‹จํ•˜๋‹ค!! ํ•˜์ง€๋งŒ, ์—ฌ๊ธฐ์„œ๋„ ๋ช‡๊ฐ€์ง€ ์ด์Šˆ๊ฐ€ ์žˆ๋‹ค. ๊ทธ๊ฒƒ์€ ๋ฐ”๋กœ โ€œcluster-point ๊ฑฐ๋ฆฌ๋ฅผ ์–ด๋–ป๊ฒŒ ์ธก์ •ํ•  ๊ฒƒ์ธ๊ฐ€โ€๋‹ค!


๋‘ โ€˜์ โ€™์˜ ๊ฑฐ๋ฆฌ๋ฅผ ์ธก์ •ํ•˜๋Š” ๊ฒƒ์€ ๊ฐ„๋‹จํ•˜๊ฒŒ ์œ ํด๋ฆฌ๋“œ ๊ฑฐ๋ฆฌ๋กœ ์ธก์ •ํ•  ์ˆ˜ ์žˆ๋‹ค.

๊ทธ๋Ÿฌ๋‚˜ cluster-point, cluster-cluster์˜ ๊ฑฐ๋ฆฌ๋ฅผ ์ธก์ •ํ•˜๋Š” ๊ฒƒ์€ ์—ฌ๋Ÿฌ๊ฐ€์ง€ ๋ฐฉ๋ฒ•์ด ๊ฐ€๋Šฅํ•˜๋‹ค.

  • Complete linkage: maximal intercluster dissimilarity
  • Single linkage: minimal intercluster dissimilarity
  • Average linkage: mean intercluster dissimilarity
  • Centroid linkage: dissimilarity between the centroids

๋ณดํ†ต โ€œcomplete linkageโ€์™€ โ€œaverage linkageโ€๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค. โ€œcentroid linkageโ€์˜ ๊ฒฝ์šฐ, ์ˆœ์„œ๊ฐ€ ์—ญ์ „๋˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋ฐœ์ƒํ•˜๊ธฐ ๋•Œ๋ฌธ์—, ๊ถŒ์žฅํ•˜์ง„ ์•Š๋Š”๋‹ค.

Cluster ์‚ฌ์ด ๊ฑฐ๋ฆฌ๋Š” non-decreasing ํ•ด์•ผ ํ•˜๋Š”๋ฐ, ์ด ๊ฒฝ์šฐ๋Š” 8L ๋‹ค์Œ์— 7L์ด ๋˜์–ด inversion์ด ์ƒ๊ฒผ๋‹ค.


Spectral Clustering

๐Ÿ’ฅ ์ •๋ง ํŠน์ดํ•œ Clustering ๋ฐฉ์‹์ด๋‹ค. ๋‚ด์šฉ์ด ์กฐ๊ธˆ ์–ด๋ ค์šฐ๋‹ˆ ์ง‘์ค‘ํ•˜์ž!

Traditional clustering (ex. K-means) does not work well when clusters are non-convex. <Spectral Clustering> is designed for these situations.

์ผ๋‹จ ๋ฐ”๋น ์„œ ํŒจ-์Šค


์ด๋ฒˆ ํฌ์ŠคํŠธ์—์„œ ์‚ดํŽด๋ณธ ๋ฐฉ์‹์€ ๋ชจ๋‘ ํœด๋ฆฌ์Šคํ‹ฑ ๋ฐฉ๋ฒ•๋“ค์ด๋‹ค. ์ด์–ด์ง€๋Š” ํฌ์ŠคํŠธ์—์„œ๋Š” ๋ชจ๋ธ(model)์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•˜๋Š” <Model-based clustering>์„ ์‚ดํŽด๋ณธ๋‹ค. <Mixture Model>, <EM-Algorithm> ๋“ฑ ๋ฌด์‹œ๋ฌด์‹œํ•œ ๊ฒƒ๋“ค์ด ์™•์ฐฝ ๋‚˜์˜จ๋‹ค;;

๐Ÿ‘‰ Model-based Clustering