AdaBoost
2021-1νκΈ°, λνμμ βν΅κ³μ λ°μ΄ν°λ§μ΄λβ μμ μ λ£κ³ 곡λΆν λ°λ₯Ό μ 리ν κΈμ λλ€. μ§μ μ μΈμ λ νμμ λλ€ :)
<Boosting>μ λν κ°μλ μλμ ν¬μ€νΈμμ νμΈν μ μμ΅λλ€ π
π Boosting
AdaBoost
<AdaBoost>λ βAdaptive Boostingβμ μ½μλ‘ <Boosting> μκ³ λ¦¬μ¦μ μ²μμΌλ‘ κ³ μν Schapire & Freundμ μν΄ 1996λ κ°λ°λ μ΅μ΄μ Boosting λͺ¨λΈμ΄λ€.
μκ³ λ¦¬μ¦μ μ΄ν΄λ³΄κΈ° μ μ λ¨Όμ μ μ μ νμΈνμ.
[Data]
$\{ (x_1, y_1), \dots, (x_n, y_n) \}$
- p-dimensional input: $x_i \in \mathcal{X}$
- Binary output: $y_i \in \{ -1, 1 \}$ // 1, -1λ‘ μΈμ½λ©
[Weak Classifier]
$m$-th Weak Classifier $G_m : \mathcal{X} \rightarrow \{-1, 1 \}$
[Prediction]
predictions are based on a weighted majority vote!
\[G(x) = \text{sign} \left( \sum^M_{m=1} \alpha_m G_m (x) \right)\]where $\alpha_m$ is weight of weak classifier.
Algorithm. AdaBoost
Initialize the weights for data points
$w_i = 1 / n$ for $i=1, \dots, n$.
// training!
For $m=1$ to $M$
ββ Obtain $G_m(x)$ based on the training data using weights $w_i$
ββ Compute error of $m$-th classifier.
ββ Compute weights for $m$-th classifier
\[\alpha_m = \log \left( \frac{1 - \text{err}_m}{\text{err}_m}\right)\]ββ // μ λΆλ₯ $w_i$κ° κ·Έλλ‘, μ€λΆλ₯ $w_i$κ° μ»€μ§ β normalize β μ λΆλ₯ $w_i$ βΌ, μ€λΆλ₯ $w_i$ β²
ββ Update weights for data points
\[w_i \leftarrow w_i \cdot \exp \left[ \alpha_m \cdot I(y_i \ne G_m(x_i)) \right] \quad \text{for} \quad i=1, \dots, n\]Output $\displaystyle G(x) = \text{sign} \left[ \sum^M_{m=1} \alpha_m G_m(x) \right]$
Boosting as Additive Expansion
<AdaBoost> μκ³ λ¦¬μ¦μ κ²°κ³ΌμΈ $G(x)$
\[G(x) = \text{sign} \left[ \sum^M_{m=1} \alpha_m G_m(x) \right]\]λ₯Ό additive/basis expansion λͺ¨λΈλ‘λ μ΄ν΄ν μ μλ€.
Algorithm. general additive function expansion
μ¦, $\alpha_m$λ₯Ό basis coefficient $\beta_m$λ‘, weak classifier $G_m(x)$λ₯Ό basis function $h(x; \theta_m)$μΌλ‘ ν΄μνλ κ²μ΄λ€!
λ³Έλ additive modelμ ERM(Empirical Risk Minimization)μΌλ‘ fitting μν€λ κ²μ hardν λ¬Έμ μ΄λ€.
\[\min_{\left\{\beta_m, \theta_m\right\}^M_{m=1}} \; \sum^n_{i=1} L \left( y_i, \sum^M_{m=1} \beta_m \cdot h(x_i; \theta_m) \right)\]λλΆλΆμ κ²½μ°μμλ computationally infeasible νλ€κ³ 보면 λλ€.
μ΄λ° κ³μ°μ μΈ λ¬Έμ λ₯Ό ν΄κ²°νκΈ° μν λ°©λ²μΌλ‘λ
- Update the expansion sequentially
- Consider only a single basis function at each step
κ° μλλ°, μ΄λ° μ κ·Όμ νν <Forward Stagewise Update>λΌκ³ νλ€!
κ°λ¨νκ² μ€λͺ ν΄λ³΄μλ©΄,
- $\beta_0$λ₯Ό fitting
- $\beta_0$λ κ·Έλλ‘ λκ³ , $\beta_1$μ fitting
- $\beta_0, \beta_1$μ κ·Έλλ‘ λκ³ , $\beta_2$λ₯Ό fitting
- β¦
Algorithm. Forward Stagewise Update π₯
Supp. that the current expansion is
\[f_{m-1} (x) = \sum^{m-1}_{j=1} \beta_j \cdot h(x; \theta_j)\]then, the next update can be achieved by solving
\[(\beta_m, \theta_m) = \underset{(\beta, \theta)}{\text{argmin}} \sum^n_{i=1} L \left( y_i, f_{m-1}(x_i) + \beta \cdot h(x_i; \theta) \right)\]μ¦, <forward stagewise update>λ κΈ°μ‘΄μ solutionμ μμ νμ§ μμΌλ©΄μ μλ‘μ΄ basis functionμ μΆκ°νλ€.
Example. Forward Stagewise Update with Squared Error Loss
Consider the squared error loss:
\[L(y, f(x)) = (y - f(x))^2\]then, the loss in forward stagewise is
\[L(y_i, f_{m-1} (x_i) + \beta \cdot h(x_i; \theta)) = \left( y_i - f_{m-1} (x_i) - \beta \cdot h(x_i; \theta) \right)\]μ΄λ, current residualμΈ $(y_i - f_{m-1}(x_i))$λ μμκ°μΌλ‘ $\beta, \theta$λ₯Ό μ°Ύλλ°μ μν₯μ μ£Όμ§ μλλ€. λ°λΌμ μ°λ¦¬λ $\beta, \theta$μ μ¨μ ν μ§μ€νλ©΄ λλ€!
Example. Forward Stagewise Update with Exponential Loss π₯
Consider the exponential error loss:
\[L(y, f(x)) = \exp (-y \cdot f(x))\]μ΄λ! λλκ²λ <AdaBoost>λ Forward Stagewise Updates for the ERM with Exponential Lossμ λμΉλ€!! π²
μ΄ λͺ μ λ <Boosting>μ λν μμ ν μλ‘μ΄ λ°©μμ μμΌλ₯Ό μ μνλ€!
Note that the $m$-th step of the forward stagewise updates solves
\[\begin{aligned} (\beta_m , G_m) &= \underset{(\beta, G)}{\text{argmin}} \sum^n_{i=1} \exp \left[ -y \cdot \left( f_{m-1}(x_i) + \beta \cdot G(x_i) \right)\right] \\ &= \underset{(\beta, G)}{\text{argmin}} \sum^n_{i=1} w_i^{(m)} \cdot \exp \left[ - \, y_i \cdot \beta \cdot G(x_i) \right] \\ & \quad \text{where} \quad w_i^{(m)} = \exp \left[ -y \cdot f_{m-1}(x_i) \right] \end{aligned}\]It can be shown that
\[G_m = \underset{G}{\text{argmin}} \sum^n_{i=1} w_i^{(m)} \cdot I(y_i \ne G(x_i))\] \[\beta_m = \frac{1}{2} \log \left( \frac{1-\text{err}_m}{\text{err}_m}\right)\]where
\[\text{err}_m = \frac{\displaystyle\sum^n_{i=1} w_i^{(m)} \cdot I(y_i \ne G_m(x_i))}{\displaystyle\sum^n_{i=1} w_i^{(m)}}\]Then, the update is
\[f_(x) = f_{m-1}(x) + \beta_m \cdot G_m (x)\]and leads the weights for the new iterations to be
\[\begin{aligned} w_i^{(m+1)} &= w_i^{(m)} \cdot \exp \left[ - \, y_i \cdot \beta_m \cdot G_m (x_i) \right] \\ &= w_i^{(m)} \cdot \exp \left[ \alpha_m \cdot I(y_i \ne G_m(x_i)) \right] \cdot \exp \left[ - \beta_m \right] \end{aligned}\]where $\alpha_m = 2 \beta_m$ appears in the AdaBoost algorithm.
Q. Why Exponential Loss?
Population minimizer of the exponential loss:
\[\begin{aligned} f^{*} (x) &= \underset{f(x)}{\text{argmin}} \; E_{Y \mid x} \left[ L(Y, f(x)) \right] \\ &= \underset{f(x)}{\text{argmin}} \; E_{Y \mid x} \left[ \exp \left( -Y \cdot f(x) \right) \right] \\ &= \frac{1}{2} \log \frac{P(Y = 1 \mid x)}{P(Y = -1 \mid x)} \end{aligned}\]equivalently,
\[P(Y = 1 \mid x) = \frac{\exp \left( f^* (x) \right)}{\exp \left( -f^* (x) \right) + \exp \left( f^* (x) \right)}\]Derivation
It is true that $P(Y=1 \mid x) = 1 - P(Y=-1 \mid x)$.
\[\begin{aligned} f^*(x) &= \frac{1}{2} \log \frac{P(Y = 1 \mid x)}{P(Y = -1 \mid x)} \\ \exp \left( 2 \cdot f^* (x) \right) &= \frac{P(Y = 1 \mid x)}{P(Y = -1 \mid x)} \\ &= \frac{P(Y = 1 \mid x)}{1 - P(Y=1 \mid x)} \\ \exp \left( 2 \cdot f^* (x) \right) \cdot (1 - P(Y = 1 \mid x)) &= P(Y = 1 \mid x) \\ \exp \left( 2 \cdot f^* (x) \right) &= P(Y = 1 \mid x) + \exp \left( 2 \cdot f^* (x) \right) \cdot P(Y = 1 \mid x) \\ &= P(Y = 1 \mid x) \cdot \left( 1 - \exp \left( 2 \cdot f^* (x) \right) \right) \\ \frac{\exp \left( 2 \cdot f^* (x) \right)}{1 - \exp \left( 2 \cdot f^* (x) \right)} &= P(Y = 1 \mid x) \\ \frac{\exp \left( f^* (x) \right)}{\exp \left( -f^* (x) \right) + \exp \left( f^* (x) \right)} &= \end{aligned}\]$\blacksquare$
<AdaBoost>μ λν΄μλ <AdaBoost>λ₯Ό ꡬνλ κ²μ΄ μ¬μ€ βForward Stagewise Updates for the ERM with Exponential LossβλΌλ κ²μ κΌ κΈ°μ΅ν΄μΌ νλ€!! π€
μ΄νμ λμ¨ GBM κ³μ΄μ΄ λ μ±λ₯μ΄ μ’μμ μμ¦μ μ μ°μ§ μλλ€κ³ νλ€ π₯