[ML] 機器學習基石：第八講 Noise and Error

ML：基礎學習
課程：機器學習基石
簡介：第八講 Noise and Error

假如資料存在 Noise 呢？

原本	機率
$\mathbf{x}\sim \mathbb{P}(\mathbf{x})$	$\mathbf{x}\sim \mathbb{P}(\mathbf{x})$
$f(\mathbf{x}) \neq h(\mathbf{x})$	$y \neq h(\mathbf{x})\ with\ y \sim \mathbb{P}(y\|\mathbf{x})$

同樣可使用在 VC bound 上，只要是 i.i.d
Target Distribution $\mathbb{P}(y|\mathbf{x})$
可視為理想的值加上 noise
$\mathbb{P}(\mathrm{o}|\mathbf{x})=0.7$ => 理想的值
$\mathbb{P}(\mathrm{x}|\mathbf{x})=0.3$ => error
事實上先前的例子，也只是 $\mathbb{P}(y|\mathbf{x})=1\ for\ y=f(\mathbf{x})$ 的特殊例子

Pointwise Error 衡量

$E(g,f)=平均\ err(g(\mathbf{x}) \neq f(\mathbf{x}))$

樣本中	樣本外
$E_{in}(g)=\frac{1}{N}\sum_{n=1}^{N}err(g(\mathbf{x_n}),y_n)$	$E_{out}(g)=\underset{\mathbf{x},y \sim P}{\varepsilon} [err(g(\mathbf{x_n}),y_n)]$

Err 的方式

0/1 error	square error
$err(\tilde{y},y)=[\tilde{y} \neq y]$	$err(\tilde{y},y)=(\tilde{y}-y)^2$
通常使用在分類上	通常使用在 regression

不同 Err 的方式，會得到不同的 Ideal Mini-Target $f(\mathbf{x})$

$\mathbb{P}(y=1|\mathbf{x})=0.2$
$\mathbb{P}(y=2|\mathbf{x})=0.7$
$\mathbb{P}(y=3|\mathbf{x})=0.1$

0/1 error	square error
$$ \tilde{y}= \left\{\begin{matrix} 1 & avg. err & 0.8\\ 2 & avg. err & 0.3(*)\\ 3 & avg. err & 0.9\\ 1.9 & avg. err & 1.0 \end{matrix}\right. $$	$$ \tilde{y}= \left\{\begin{matrix} 1 & avg. err & 1.1=(1-2)^20.7+(1-3)^20.1\\ 2 & avg. err & 0.3\\ 3 & avg. err & 1.5\\ 1.9 & avg. err & 0.29(*) \end{matrix}\right. $$
$ideal\ f(\mathbf{x})=\underset{y\in \mathbf{y}}{argmax}\ \mathbb{P}(y\|\mathbf{x})$	$ideal\ f(\mathbf{x})=\sum_{y\in \mathbf{y}}y\cdot \mathbb{P}(y\|\mathbf{x})$

不同情況，會有不同的錯誤權重值

$g$
+1	-1

$f$
	+1
	-1

no error	false reject
false accept	no error

超市折扣

false reject : 10 => 防止失去客戶
false accept : 1

安全系統

false reject : 1
false accept : 1000 => 防止機密外洩

修改 PLA 演算法 for Error 權重

$$ E_{in}^W(h)=\frac{1}{N}\sum_{n=1}^{N} \begin{Bmatrix} 1 & y_n= +1\\ 1000 & y_n= -1 \end{Bmatrix} \cdot [y_n \neq h(\mathbf{x_n})] $$

相當於對 $(\mathbf{x_n},\mathbf{y_n}=-1)$ 複製至 1000筆
而對原先的 $(\mathbf{x_n},\mathbf{y_n}=-1)$ 的權重仍為 1

修正流程

初始化 $\mathbf{\widehat{w}}$
For $t = 0, 1, \cdots $

$(\mathbf{x_{n(t)}},y_{n(t)})$ 找到 $\mathbf{w_t}$ 的下一個錯誤 (亂數選擇較佳)

$(\mathbf{x_n},\mathbf{y_n}=-1)$ 需比原本多出 1000 倍機率被選中

修正錯誤
假如 $\mathbf{w_{t+1}}$ 的 $E_{in}^W$ 比 $\mathbf{\widehat{w}}$ 少，則將 $\mathbf{w_{t+1}}$ 設為 $\mathbf{\widehat{w}}$
直到足夠的次數或小於設定的錯誤門檻

weighted pocket 程式實現方法
選取可用輪盤，每個區域對應一筆資料，然後加大 $(\mathbf{x_n},\mathbf{y_n}=-1)$ 的區域
最好不要直接複製原先的資料

unbalanced data

例：在安全系統中，樣本有 10 筆入侵資料，999990 筆登入成功的資料
假設 false accept = 1000, false reject = -1
然後當 $h(\mathbf{x})$ 總是個定值 +1，那麼 error = ?

$$ \frac{10*1000}{10+999990}=0.01 $$

對於機器來說，似乎 $h(\mathbf{x})=+1$ 是個可能的解
通常對於 unbalanced 的 data 會用適當調整權重值的方式，以免 $h(\mathbf{x})$ 單純只輸出個定值

討論區回應

問題
請問 $E_{in}^{W}$ 的 $N$ 是否也要加上同等的資料量？
因為先前有提到，需虛擬增加 1000 倍的 -1 資料
所以在這個練習上 w = 1000，$\frac{10*w}{(999990+10*w)}=0.009901$
請問這樣的理解是否有誤？感謝

答覆
好問題，在一般的 $E_{in}^{W}$ 定義中，習慣上並不會改變 $N$ 的部份。

思考
跑過模擬，因選的會是最小的 $E_{in}$，所以通常改不改變皆無影響選擇結果

原本	機率
\(\mathbf{x}\sim \mathbb{P}(\mathbf{x})\)	\(\mathbf{x}\sim \mathbb{P}(\mathbf{x})\)
\(f(\mathbf{x}) \neq h(\mathbf{x})\)	\(y \neq h(\mathbf{x})\ with\ y \sim \mathbb{P}(y\|\mathbf{x})\)

子風的知識庫

搜尋此網誌