[ML] 機器學習基石：第五講 Training versus Testing

ML：基礎學習
課程：機器學習基石
簡介：第五講 Training versus Testing

$$\mathbb{P}\left [ \left | E_{in}(g)-E_{out}(g) \right | > \epsilon \right ]\leq 2\cdot \color{Orange}M\cdot e^{-2\epsilon ^2N}$$
	讓 $E_{out}(g)\to E_{in}(g)$	讓 $E_{in}(g) \to 0 $
小 M	Yes	No，太少選擇
大 M	No	Yes

需取一個合適的 $M$，但若 $M=\infty $ 則必定不會收斂，那麼 $M$ 有限嗎？

是否存在一個有限的 $M=m_H$ 呢？

如下，若取聯集，事實上是高估錯誤，有許多重疊的部分

$$ \mathbb{P}[\mathit{B_1}\ or\ \mathit{B_2}\ or\ \cdots \ or\ \mathit{B_M}] \leq \mathbb{P}[\mathit{B_1}]\ or\ \mathbb{P}[\mathit{B_2}]\ or\ \cdots \ or\ \mathbb{P}[\mathit{B_M}] $$

例：$H=\{all\ lines\ h\in\mathbb{R}^2\}$ 無限多條線，但二分法下表現的種類有限
一個點，只有兩種

兩個點，只有四種

三個點，八種跟六種

四個點，最多十四種

得證：二分法下，$M \le 2^N$

成長函數

$$ \begin{align*} H&=\{hypothesis\ h:\mathbf{X}\rightarrow \{\mathrm{{\color{Red} x}},\mathrm{{\color{Cyan} o}}\}\} \\ h(\mathbf{x_1},\mathbf{x_2},\cdots ,\mathbf{x_N} ) &= (h(\mathbf{x_1}), h(\mathbf{x_2}),\cdots , h(\mathbf{x_N})) \in \{\mathrm{{\color{Red} x}},\mathrm{{\color{Cyan} o}}\}^N \\ H(\mathbf{x_1},\mathbf{x_2},\cdots ,\mathbf{x_N})&\Rightarrow 所有可能的組合\\ m_H(N) &= \underset{\mathbf{x_1},\mathbf{x_2},\cdots ,\mathbf{x_N}\in\mathbf{X}}{max}\left | H(\mathbf{x_1},\mathbf{x_2},\cdots ,\mathbf{x_N}) \right | \end{align*} $$

	hypotheses $H$	dichotomies $H(\mathbf{x_1},\mathbf{x_2},\cdots ,\mathbf{x_N})$
例如	$h\in\mathbb{R}^2$	$\left \{ \color{Cyan} {oooo},\color{Cyan} {ooo}\color{Red}{ x},\color{Cyan}{ oo}\color{Red}{ xx},\cdots \right \}$
大小	$\infty $	$m_H(N)\le 2^N$

例子：
Positive Rays

$$ m_H(N) = N+1 $$

Positive Intervals

$$ \begin{align*} m_H(N) &= \binom{N+1}{2}+1\\ &= \frac{1}{2}N^2+\frac{1}{2}N+1 \end{align*} $$

Convex Sets

$$ m_H(N) = 2^N \\ shattered\ by\ H $$

$m_H(N)$ 上升速度

$$\mathbb{P}\left [ \left | E_{in}(g)-E_{out}(g) \right | > \epsilon \right ]\leq 2\cdot \color{Orange}{m_H(N)}\cdot e^{-2\epsilon ^2N}$$

定義 break point k：從 k 開始，無論是 k+1, k+2, ... 永遠 $m_H(k) < 2^k$

positive rays: $m_H(N) = N + 1 = O(N)$

break point at 2

positive intervals: $m_H(N) =\frac{1}{2}N^2+\frac{1}{2}N+1$

break point at 3

convex sets: $m_H(N) = 2^N$

no break point

2D perceptrons: $m_H(N) < 2^N$ in some cases

break point at 4
若是 5, 6, 7, ... 也都擁有 4 個點，若 4 個點都無法 shatter，那麼它們也無法

推測 (下一講證明)：

no break point: $m_H(N) = 2^N$
break point k: $m_H(N) = O(N^{k−1})$

	hypotheses \(H\)	dichotomies \(H(\mathbf{x_1},\mathbf{x_2},\cdots ,\mathbf{x_N})\)
例如	\(h\in\mathbb{R}^2\)	\(\left \{ \color{Cyan} {oooo},\color{Cyan} {ooo}\color{Red}{ x},\color{Cyan}{ oo}\color{Red}{ xx},\cdots \right \}\)
大小	\(\infty \)	\(m_H(N)\le 2^N\)

子風的知識庫

搜尋此網誌

[ML] 機器學習基石：第五講 Training versus Testing

是否存在一個有限的 \(M=m_H\) 呢？

成長函數

\(m_H(N)\) 上升速度

留言

張貼留言