[ML] 機器學習技法：第四講 soft-margin SVM

ML：基礎技法學習
Package：scikit-learn
課程：機器學習技法
簡介：第四講 soft-margin SVM (support vector machine)

建議的 Library

Linear LIBLINEAR
non-linear LIBSVM

soft-margin SVM primal

$$ \begin{align*} \underset{b,\mathbf{w},\boldsymbol\xi}{min} &\ \ \frac{1}{2}\mathbf{w}^T\mathbf{w}+C\cdot \sum_{n=1}^{N}\xi _n\\ subject\ to &\ \ y_n(\mathbf{w}^T\mathbf{x}_n+b) \geq 1-\xi _n\ and\ \xi _n\geq 0\ for\ all\ n \end{align*} $$ QP of $\tilde{d}+1+N$ 個變數，$2N$ 個條件

soft-margin SVM dual

$$ \begin{align*} \underset{\boldsymbol{\alpha}}{min} &\ \ \ \frac{1}{2}\sum_{n=1}^{N}\sum_{m=1}^{N}\alpha _n \alpha _m y_ny_m\mathbf{z}_n^T\mathbf{z}_m-\sum_{n=1}^{N}\alpha _n\\ subject\ to &\ \ \ \sum_{n=1}^{N}y_n \alpha _n=0;\\ &\ \ \ C\geq \alpha _n \geq0,for\ n=1,2,\cdots ,N\\ implicitly &\ \ \ \mathbf{w}=\sum_{n=1}^{N}\alpha _n y_n\mathbf{z}_n;\\ &\ \ \ \beta _n=C-\alpha _n,for\ n=1,2,\cdots ,N \end{align*} $$ QP of $N$ 個變數，$2N+1$ 個條件

與 Hard margin 只差在 $\alpha_n$ 有上限限制 $C$

Kernel soft-margin SVM

b 的求解，必須為 free SV，極少數若無 free SV 則需滿足 KKT 條件(與 hard-margin 略有不同)

$\mathbf{x}_n$ 可用 $\mathbf{z}_n$ 取代，延伸至更高維度的轉換

仍會有 overfit 的風險，需小心選取 $(\gamma,C)$

$\alpha_n$ 的意義

兩組 complementary slackness

$$ \begin{align*} \alpha_n(1- \xi_n- y_n(\mathbf{w}^T\mathbf{z}_n+b))&=0\\ (C-\alpha_n)\xi _n&=0 \end{align*} $$

$\alpha_n=0,\xi_n=0$

non SV
可能遠離邊界或剛好在邊界上

$C>\alpha_n>0,\xi_n=0$

$\square$ free SV
剛好在邊界上

$\alpha_n=C,\xi_n=1- y_n(\mathbf{w}^T\mathbf{z}_n+b)$ 即是違反的值

$\triangle$ bounded SV
違反或剛好在邊界上

可視情況用在資料的解析上，何者有幫助、何者無幫助、何者可能有 noise

Model Selection

利用 validation 選擇合適的參數
橫軸為 C，縱軸為 $\gamma$，$E_{cv}$ 對於 SVM 非常常用

$$ E_{loocv} \leq \frac{\#SV}{N} $$ $N$ 為資料個數

下圖為 SV 的個數，但可看出並無法用來挑選最佳 model
因上面的式子只是表達其錯誤上限，實際上的錯誤還是未知
實際上是用來排除不合理的 model，再進行 $E_{cv}$ 的計算

程式碼

import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm
from sklearn.model_selection import cross_val_score                                     

# 訓練資料
X = np.c_[(.4, -.7),
          (-1.5, -1),
          (-1.4, -.9),
          (-1.3, -1.2),
          (-1.1, -.2),
          (-1.2, -.4),
          (-.5, 1.2),
          (-1.5, 2.1),
          (1, 1),
          # --
          (1.3, .8),
          (1.2, .5),
          (.2, -2),
          (.5, -2.4),
          (.2, -2.3),
          (0, -2.7),
          (1.3, 2.1)].T
Y = np.r_[[0] * 8 + [1] * 8]

# C 越大表示越無法容忍錯誤
C = 1
# fit 表示訓練
# x^Tx'
g_svm_linear = svm.SVC(kernel='linear', C=C).fit(X, Y)
# (5+10x^Tx')^3
g_svm_poly = svm.SVC(kernel='poly', degree=3, coef0=5, gamma=10, C=C).fit(X, Y)
# exp(-2|x-x'|^2)
g_svm_rbf = svm.SVC(kernel='rbf', gamma=2, C=C).fit(X, Y)

# title for the plots
titles = ['linear kernel => $\mathbf{x}^T\mathbf{x}\'$',
          'polynomial kernel => $(5+10\mathbf{x}^T\mathbf{x}\')^3$',
          'Gaussian kernel => $e^{(-2|\mathbf{x}-\mathbf{x}\'|^2)}$',]

# 產生 meshgrid，共 200x100 的點
XX1, XX2 = np.mgrid[-2:2:200j, -3:3:100j]

for i, g_svm in enumerate([g_svm_linear, g_svm_poly, g_svm_rbf]):
    # 5-Fold Cross Validation
    scores = cross_val_score(g_svm, X, Y, cv=5)
    # 平均埴
    m = scores.mean()
    # 標準差
    sd = scores.std()
    
    plt.subplot(3, 1, i + 1)
    # 調整之間的空白高度
    plt.subplots_adjust(hspace=.6)
    
    # 代進值，得到距離
    # 回傳判斷結果
    YY = g_svm.predict(np.c_[XX1.ravel(), XX2.ravel()])
    # 重新排列結果，符合 XX1
    YY = YY.reshape(XX1.shape)  

    # 畫圖，contour 不填滿顏色，contourf 填滿顏色
    plt.contourf(XX1, XX2, YY, cmap=plt.cm.bone, alpha=0.5)
    
    # 得到 free SV 的距離
    margin = max(g_svm.decision_function(g_svm.support_vectors_))
    # 只取足夠距離的 free SV
    index0 = g_svm.decision_function(X[g_svm.support_[Y[g_svm.support_]==0]])+margin<0.1
    index1 = g_svm.decision_function(X[g_svm.support_[Y[g_svm.support_]==1]])-margin<0.1
    index = np.r_[index0, index1]
    
    # 畫出所有點
    plt.scatter(X[:, 0], X[:, 1], c=Y, cmap=plt.cm.Paired)
    # 將 free support vector 標示出來
    plt.scatter(g_svm.support_vectors_[index, 0], g_svm.support_vectors_[index, 1], color='g', linewidths=3, linestyle='-', s=100, facecolors='none')

    
    # 回傳距離
    YY = g_svm.decision_function(np.c_[XX1.ravel(), XX2.ravel()])
    # 重新排列結果，符合 XX1
    YY = YY.reshape(XX1.shape)  
    # 畫出界線
    plt.contour(XX1, XX2, YY, colors=['k', 'k', 'k'], linestyles=['--', '-', '--'], levels=[-margin, 0, margin])
    # 標題
    plt.title('{} Accuracy: {:0.2f}$\pm${:0.2f}'.format(titles[i],m,sd*3))

plt.show()

參考

sklearn.svm.SVC

子風的知識庫

搜尋此網誌