AdaBoost

2021-1학기, 대학에서 ‘통계적 데이터마이닝’ 수업을 듣고 공부한 바를 정리한 글입니다. 지적은 언제나 환영입니다 :)

May 10, 2021 9 minute read

2021-1학기, 대학에서 ‘통계적 데이터마이닝’ 수업을 듣고 공부한 바를 정리한 글입니다. 지적은 언제나 환영입니다 :)

<Boosting>에 대한 개요는 아래의 포스트에서 확인할 수 있습니다 😁

👉 Boosting

AdaBoost

<AdaBoost>란 “Adaptive Boosting”의 약자로 <Boosting> 알고리즘을 처음으로 고안한 Schapire & Freund에 의해 1996년 개발된 최초의 Boosting 모델이다.

알고리즘을 살펴보기 전에 먼저 셋업을 확인하자.

[Data]

$\{ (x_1, y_1), \dots, (x_n, y_n) \}$

p-dimensional input: $x_i \in \mathcal{X}$
Binary output: $y_i \in \{ -1, 1 \}$ // 1, -1로 인코딩

[Weak Classifier]

$m$-th Weak Classifier $G_m : \mathcal{X} \rightarrow \{-1, 1 \}$

[Prediction]

predictions are based on a weighted majority vote!

\[G(x) = \text{sign} \left( \sum^M_{m=1} \alpha_m G_m (x) \right)\]

where $\alpha_m$ is weight of weak classifier.

Algorithm. AdaBoost

Initialize the weights for data points
$w_i = 1 / n$ for $i=1, \dots, n$.

// training!
For $m=1$ to $M$
Obtain $G_m(x)$ based on the training data using weights $w_i$
Compute error of $m$-th classifier.

\[\text{err}_m = \frac{\displaystyle\sum^n_{i=1} w_i \cdot I(y_i \ne G_m(x_i))}{\displaystyle\sum^n_{i=1} w_i}\]

Compute weights for $m$-th classifier

\[\alpha_m = \log \left( \frac{1 - \text{err}_m}{\text{err}_m}\right)\]

// 정분류 $w_i$가 그대로, 오분류 $w_i$가 커짐 → normalize → 정분류 $w_i$ ▼, 오분류 $w_i$ ▲

Update weights for data points

\[w_i \leftarrow w_i \cdot \exp \left[ \alpha_m \cdot I(y_i \ne G_m(x_i)) \right] \quad \text{for} \quad i=1, \dots, n\]

Output $\displaystyle G(x) = \text{sign} \left[ \sum^M_{m=1} \alpha_m G_m(x) \right]$

Boosting as Additive Expansion

<AdaBoost> 알고리즘의 결과인 $G(x)$

\[G(x) = \text{sign} \left[ \sum^M_{m=1} \alpha_m G_m(x) \right]\]

를 additive/basis expansion 모델로도 이해할 수 있다.

Algorithm. general additive function expansion

\[f(x) = \sum^M_{m=1} \beta_m \cdot h(x; \theta_m)\]

즉, $\alpha_m$를 basis coefficient $\beta_m$로, weak classifier $G_m(x)$를 basis function $h(x; \theta_m)$으로 해석하는 것이다!

본래 additive model을 ERM(Empirical Risk Minimization)으로 fitting 시키는 것은 hard한 문제이다.

\[\min_{\left\{\beta_m, \theta_m\right\}^M_{m=1}} \; \sum^n_{i=1} L \left( y_i, \sum^M_{m=1} \beta_m \cdot h(x_i; \theta_m) \right)\]

대부분의 경우에서는 computationally infeasible 하다고 보면 된다.

이런 계산적인 문제를 해결하기 위한 방법으로는

Update the expansion sequentially
Consider only a single basis function at each step

가 있는데, 이런 접근을 흔히 <Forward Stagewise Update>라고 한다!

간단하게 설명해보자면,

$\beta_0$를 fitting
$\beta_0$는 그대로 두고, $\beta_1$을 fitting
$\beta_0, \beta_1$은 그대로 두고, $\beta_2$를 fitting
…

Algorithm. Forward Stagewise Update 🔥

Supp. that the current expansion is

\[f_{m-1} (x) = \sum^{m-1}_{j=1} \beta_j \cdot h(x; \theta_j)\]

then, the next update can be achieved by solving

\[(\beta_m, \theta_m) = \underset{(\beta, \theta)}{\text{argmin}} \sum^n_{i=1} L \left( y_i, f_{m-1}(x_i) + \beta \cdot h(x_i; \theta) \right)\]

즉, <forward stagewise update>는 기존의 solution을 수정하지 않으면서 새로운 basis function을 추가한다.

Example. Forward Stagewise Update with Squared Error Loss

Consider the squared error loss:

\[L(y, f(x)) = (y - f(x))^2\]

then, the loss in forward stagewise is

\[L(y_i, f_{m-1} (x_i) + \beta \cdot h(x_i; \theta)) = \left( y_i - f_{m-1} (x_i) - \beta \cdot h(x_i; \theta) \right)\]

이때, current residual인 $(y_i - f_{m-1}(x_i))$는 상수값으로 $\beta, \theta$를 찾는데에 영향을 주지 않는다. 따라서 우리는 $\beta, \theta$에 온전히 집중하면 된다!

Example. Forward Stagewise Update with Exponential Loss 🔥

Consider the exponential error loss:

\[L(y, f(x)) = \exp (-y \cdot f(x))\]

이때! 놀랍게도 <AdaBoost>는 Forward Stagewise Updates for the ERM with Exponential Loss와 동치다!! 😲

이 명제는 <Boosting>에 대한 완전히 새로운 방식의 시야를 제시한다!

Note that the $m$-th step of the forward stagewise updates solves

\[\begin{aligned} (\beta_m , G_m) &= \underset{(\beta, G)}{\text{argmin}} \sum^n_{i=1} \exp \left[ -y \cdot \left( f_{m-1}(x_i) + \beta \cdot G(x_i) \right)\right] \\ &= \underset{(\beta, G)}{\text{argmin}} \sum^n_{i=1} w_i^{(m)} \cdot \exp \left[ - \, y_i \cdot \beta \cdot G(x_i) \right] \\ & \quad \text{where} \quad w_i^{(m)} = \exp \left[ -y \cdot f_{m-1}(x_i) \right] \end{aligned}\]

It can be shown that

\[G_m = \underset{G}{\text{argmin}} \sum^n_{i=1} w_i^{(m)} \cdot I(y_i \ne G(x_i))\] \[\beta_m = \frac{1}{2} \log \left( \frac{1-\text{err}_m}{\text{err}_m}\right)\]

where

\[\text{err}_m = \frac{\displaystyle\sum^n_{i=1} w_i^{(m)} \cdot I(y_i \ne G_m(x_i))}{\displaystyle\sum^n_{i=1} w_i^{(m)}}\]

Then, the update is

\[f_(x) = f_{m-1}(x) + \beta_m \cdot G_m (x)\]

and leads the weights for the new iterations to be

\[\begin{aligned} w_i^{(m+1)} &= w_i^{(m)} \cdot \exp \left[ - \, y_i \cdot \beta_m \cdot G_m (x_i) \right] \\ &= w_i^{(m)} \cdot \exp \left[ \alpha_m \cdot I(y_i \ne G_m(x_i)) \right] \cdot \exp \left[ - \beta_m \right] \end{aligned}\]

where $\alpha_m = 2 \beta_m$ appears in the AdaBoost algorithm.

Q. Why Exponential Loss?

Population minimizer of the exponential loss:

\[\begin{aligned} f^{*} (x) &= \underset{f(x)}{\text{argmin}} \; E_{Y \mid x} \left[ L(Y, f(x)) \right] \\ &= \underset{f(x)}{\text{argmin}} \; E_{Y \mid x} \left[ \exp \left( -Y \cdot f(x) \right) \right] \\ &= \frac{1}{2} \log \frac{P(Y = 1 \mid x)}{P(Y = -1 \mid x)} \end{aligned}\]

equivalently,

\[P(Y = 1 \mid x) = \frac{\exp \left( f^* (x) \right)}{\exp \left( -f^* (x) \right) + \exp \left( f^* (x) \right)}\]

Derivation

It is true that $P(Y=1 \mid x) = 1 - P(Y=-1 \mid x)$.

\[\begin{aligned} f^*(x) &= \frac{1}{2} \log \frac{P(Y = 1 \mid x)}{P(Y = -1 \mid x)} \\ \exp \left( 2 \cdot f^* (x) \right) &= \frac{P(Y = 1 \mid x)}{P(Y = -1 \mid x)} \\ &= \frac{P(Y = 1 \mid x)}{1 - P(Y=1 \mid x)} \\ \exp \left( 2 \cdot f^* (x) \right) \cdot (1 - P(Y = 1 \mid x)) &= P(Y = 1 \mid x) \\ \exp \left( 2 \cdot f^* (x) \right) &= P(Y = 1 \mid x) + \exp \left( 2 \cdot f^* (x) \right) \cdot P(Y = 1 \mid x) \\ &= P(Y = 1 \mid x) \cdot \left( 1 - \exp \left( 2 \cdot f^* (x) \right) \right) \\ \frac{\exp \left( 2 \cdot f^* (x) \right)}{1 - \exp \left( 2 \cdot f^* (x) \right)} &= P(Y = 1 \mid x) \\ \frac{\exp \left( f^* (x) \right)}{\exp \left( -f^* (x) \right) + \exp \left( f^* (x) \right)} &= \end{aligned}\]

$\blacksquare$

<AdaBoost>에 대해서는 <AdaBoost>를 구하는 것이 사실 “Forward Stagewise Updates for the ERM with Exponential Loss”라는 것을 꼭 기억해야 한다!! 😤

이후에 나온 GBM 계열이 더 성능이 좋아서 요즘은 잘 쓰지 않는다고 한다 😥

Seokyun Ha (aka. bluehorn07)

AdaBoost

AdaBoost

Boosting as Additive Expansion

You may also enjoy

Define Custom Kafka Source Connector

Define Custom Kafka Connect Transform

Deploy Debeizum Mysql Connector

Journal File System