[ML] 機器學習技法：第八講 Adaptive Boosting

ML：基礎技法學習
Package：scikit-learn
課程：機器學習技法
簡介：第八講 Adaptive Boosting

Adaptive Boosting (AdaBoost) Algorithm

$\mathbf{u}^{(1)}=[\frac{1}{N},\frac{1}{N},\cdots ,\frac{1}{N}]$
for $t=1,2,\cdots ,T$，$T$ 自行決定

從 $A(D,\mathbf{u}^{(t)})$ 得到 $g_t$ 利用 $A$ 最小化 $\mathbf{u}^{(t)}$-weighted 0/1 error
$$ g_t\leftarrow \underset{h\in H}{argmin}\left ( \sum_{n=1}^{N}u_n^{(t)}[y_n\neq h(\mathbf{x}_n)] \right ) $$
更新 $\mathbf{u}^{(t)}$ 為 $\mathbf{u}^{(t+1)}$，藉由
$$ \begin{align*} (incorrect\ examples)\ [y_n\neq g_t(\mathbf{x}_n)] &: u_n^{(t+1)} \leftarrow u_n^{(t)}\: \cdot\: \blacklozenge _t \\ (correct\ examples)\ [y_n= g_t(\mathbf{x}_n)] &: u_n^{(t+1)} \leftarrow u_n^{(t)}\ /\ \blacklozenge _t \\ \blacklozenge _t =\sqrt{\frac{1-\epsilon _t}{\epsilon _t}}\ and\ \epsilon _t&=\frac{\sum_{n=1}^{N}u_n^{(t)}[y_n\neq g_t(\mathbf{x}_n)]}{\sum_{n=1}^{N}u_n^{(t)}} \end{align*} $$
計算 $\alpha_t=\mathrm{ln}(\blacklozenge _t )$

回傳 $G(\mathbf{x})=sign \left (\sum_{t=1}^{T}\alpha_t g_t(\mathbf{x}) \right )$

現實意義

Adaptive Boosting = 弱弱的 base learning algorithm $A$ (學生)
+ 最佳化 re-weighting factor $\blacklozenge _t$ (老師)
+ magic linear aggregation $\alpha_t$ (整個班級的結論)

Theoretical Guarantee of AdaBoost

$$ E_{out}(G)\le E_{in}(G)+O\left ( \sqrt{\underset{d_{VC}\ of\ all\ possible\ G}{\underbrace{O(d_{VC}(H)\cdot T\mathrm{log}T)}}\cdot \frac{\mathrm{log}N}{N}} \right ) $$ 第一項，當 $T=O(\mathrm{log}N)$ 且 $\epsilon _t \le \epsilon < \frac{1}{2}$，則可以令 $E_{in}=0$
第二項，因 $T=O(\mathrm{log}N)$ 有限，資料量 N 若夠多也可保證做得很小

最佳化原理可參考 [ML] 機器學習技法：第十一講 Gradient Boosted Decision Tree

未整理
Computer Science 511 Theoretical Machine Learning
$E_in$ COS 511: Theoretical Machine Learning
$E_out$ COS 511: Theoretical Machine Learning
A Short Introduction to Boosting
Adaboost通論上課版
$$ \begin{align*} E_{in}(G)&=\frac{1}{N}\sum_{n=1}^{N}[G(\mathbf{x}_i)\ne y_i] \\ &=\frac{1}{N}\sum_{n=1}^{N}[sign(\sum_{t=1}^{T}\alpha_t g_t(\mathbf{x}_i))\ne y_i] \\ [f(\mathbf{x}_i)=\sum_{t=1}^{T}\alpha_t g_t(\mathbf{x}_i)]&=\frac{1}{N}\sum_{n=1}^{N}[sign(f(\mathbf{x}_i))\ne y_i] \\ [\because y_if(\mathbf{x}_i)\le0\Rightarrow e^{-y_if(\mathbf{x}_i)}\ge 1]\ \ &\le \frac{1}{N}\sum_{\mathbf{x}_i \in [G(\mathbf{x})\ne y_i]}e^{-y_if(\mathbf{x}_i)} \\ &= \frac{1}{N}\sum_{\mathbf{x}_i \in [G(\mathbf{x})\ne y_i]}e^{-y_i\sum_{t=1}^{T}\alpha_t g_t(\mathbf{x}_i)} \\ &= \frac{1}{N}\sum_{\mathbf{x}_i \in [G(\mathbf{x})\ne y_i]}e^{\sum_{t=1}^{T}-y_i\alpha_t g_t(\mathbf{x}_i)} \\ &= \frac{1}{N}\sum_{\mathbf{x}_i \in [G(\mathbf{x})\ne y_i]}\prod_{t=1}^{T}e^{-y_i\alpha_t g_t(\mathbf{x}_i)} \\ &= \sum_{\mathbf{x}_i \in [G(\mathbf{x})\ne y_i]}\frac{1}{N}\prod_{t=1}^{T}e^{-y_i\alpha_t g_t(\mathbf{x}_i)} \\ &= \sum_{\mathbf{x}_i \in [G(\mathbf{x})\ne y_i]}\frac{1}{N}e^{-y_i\alpha_1 g_1(\mathbf{x}_i)}\prod_{t=2}^{T}e^{-y_i\alpha_t g_t(\mathbf{x}_i)} \\ \left [\because \frac{1}{N}=u_i^{(1)}\ and\ e^{-y_i\alpha_t g_t(\mathbf{x}_i)}=\left\{\begin{matrix} \blacklozenge _t\ & g_t(\mathbf{x}_i)\ne y_i\\ \frac{1}{\blacklozenge _t} & g_t(\mathbf{x}_i)=y_i \end{matrix}\right. \right ] &= \sum_{\mathbf{x}_i \in [G(\mathbf{x})\ne y_i]}u_i^{(2)}\prod_{t=2}^{T}e^{-y_i\alpha_t g_t(\mathbf{x}_i)} \\ &= \sum_{\mathbf{x}_i \in [G(\mathbf{x})\ne y_i]}u_i^{(2)}e^{-y_i\alpha_2 g_2(\mathbf{x}_i)}\prod_{t=3}^{T}e^{-y_i\alpha_t g_t(\mathbf{x}_i)} \\ &= \sum_{\mathbf{x}_i \in [G(\mathbf{x})\ne y_i]}u_i^{(3)}\prod_{t=3}^{T}e^{-y_i\alpha_t g_t(\mathbf{x}_i)} \\ &=\ \ \ \ \ \ \ \vdots \\ &= \sum_{\mathbf{x}_i \in [G(\mathbf{x})\ne y_i]}u_i^{(T+1)}\\ [\because \blacklozenge _t \ge 0]\ &\le \sum_{n=1}^Nu_i^{(T+1)}\\ [iff\ u_i^{(t)}\ normalized]\ &=1\\ \end{align*} $$ VC dimension 可參考此篇[ML] 機器學習基石：第七講 The VC Dimension

AdaBoost-Stump

利用 decision stump，共有三個參數 (feature i, threshold $\theta$, direction $s$)

$$ h_{s,i,\theta }(\mathbf{x})=s\cdot sign(x_i-\theta) $$

物理上的意義，即是切水平或垂直一刀
計算時間也不長，$O(d \cdot N\mathrm{log}N)$，即是搜尋所有組合，並找到其最佳解
大概做法，對 feature 做排序 ($N$)，再利用二分法 ($\mathrm{log}N$)，每個 feature 都做 ($d$)
而且可用來挑選合適的 feature，畢竟每個 decision stump，其實就只有單個 feature，那麼得到的 $g_t$ 自然是最佳 feature

這也是世界上第一個即時人臉辨識的程式所運用的演算法

程式碼

import numpy as np
import matplotlib.pyplot as plt

from sklearn.ensemble import AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import make_gaussian_quantiles


# 建立資料，為高斯分佈
X1, y1 = make_gaussian_quantiles(cov=2.,
                                 n_samples=200, n_features=2,
                                 n_classes=2, random_state=1)
X2, y2 = make_gaussian_quantiles(mean=(3, 3), cov=1.5,
                                 n_samples=300, n_features=2,
                                 n_classes=2, random_state=1)
X = np.concatenate((X1, X2))
y = np.concatenate((y1, - y2 + 1))

plot_colors = "br"
plot_step = 0.02
class_names = "AB"

# 畫分界圖資料
xx1_min, xx1_max = X[:, 0].min() - 1, X[:, 0].max() + 1
xx2_min, xx2_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx1, xx2 = np.meshgrid(np.arange(xx1_min, xx1_max, plot_step),
                     np.arange(xx2_min, xx2_max, plot_step))

N = 4
f, axarr = plt.subplots(N//2, 2)
for n in range(N):
    if n==0:
        T = 1
    else:
        T = (n+1) * 10
    # AdaBoosted decision stump
    bdt = AdaBoostClassifier(DecisionTreeClassifier(max_depth=1, min_samples_leaf=1), n_estimators=T)
    # 訓練資料
    bdt.fit(X, y)
    
    # 畫分界圖
    Y = bdt.predict(np.c_[xx1.ravel(), xx2.ravel()])
    Y = Y.reshape(xx1.shape)
    axarr[n//2, n%2].contourf(xx1, xx2, Y, cmap=plt.cm.Paired)

    # 畫出訓練資料
    for i, l, c in zip(range(2), class_names, plot_colors):
        idx = np.where(y == i)
        axarr[n//2, n%2].scatter(X[idx, 0], X[idx, 1],
                    c=c, cmap=plt.cm.Paired,
                    label="Class %s" % l)
    # 設定座標軸上下限
    axarr[n//2, n%2].set_xlim(xx1_min, xx1_max)
    axarr[n//2, n%2].set_ylim(xx2_min, xx2_max)
    
    if n==0:
        # 畫出 legend
        axarr[n//2, n%2].legend(loc='upper left')
    # 設定 title
    axarr[n//2, n%2].set_title('T={}'.format(T))

# 設定最上面的 title
f.suptitle('AdaBoost-Stump')
# 調整之間的空白高度
plt.subplots_adjust(hspace=0.3)
plt.show()

參考

sklearn.ensemble.AdaBoostClassifier
sklearn.ensemble.AdaBoostRegressor

子風的知識庫

搜尋此網誌