ML:Backpropagation
李宏毅 老師的課程
簡介:Backpropagation
直觀的思考
簡單推導
全微分由來
$$
\begin{align*}
dL(u_1,u_2,\cdots ,u_N) &= du_1 \cdot \frac{\partial L}{\partial u_1} + du_2 \cdot \frac{\partial L}{\partial u_2} + \cdots + du_N \cdot \frac{\partial L}{\partial u_N} \\
&= \sum_{i=1}^{N} du_i \cdot \frac{\partial L}{\partial u_i}\\
\frac{\partial L(u_1,u_2,\cdots ,u_N)}{\partial x} &= \sum_{i=1}^{N} \frac{\partial u_i}{\partial x} \cdot \frac{\partial L}{\partial u_i}\\
\end{align*}
$$
推導過程
$$
\begin{align*}
\frac{\partial C}{\partial W_1} &= \frac{\partial z}{\partial W_1}\frac{\partial C}{\partial z}\\
\frac{\partial z}{\partial W_1} &= x_1\\
\frac{\partial C}{\partial z} &= \frac{\partial a}{\partial z}\frac{\partial C}{\partial a}\\
&= {f}'(z)\left ( \frac{\partial {z}'}{\partial a}\frac{\partial C}{\partial {z}'} + \frac{\partial {z}''}{\partial a}\frac{\partial C}{\partial {z}''}\right )\\
&= {f}'(z)\left ( W_3\frac{\partial C}{\partial {z}'} + W_4\frac{\partial C}{\partial {z}''}\right )\\
\end{align*}
$$
矩陣形式
$$
\begin{align*}
\frac{\partial C}{\partial \mathbf{w_1}} &= \frac{\partial z}{\partial \mathbf{w_1}}\frac{\partial C}{\partial z}\\
&= \mathbf{x}^T {f}'(z)\mathbf{w_2} \begin{bmatrix}
\frac{\partial C}{\partial {z}'}\\
\frac{\partial C}{\partial {z}''}
\end{bmatrix} \\
\end{align*}
$$
$$
\begin{align*}
\mathbf{x} &= \begin{bmatrix}
x_1\\
x_2
\end{bmatrix}\\
\mathbf{w_1} &= \begin{bmatrix}
W_1& W_2
\end{bmatrix}\\
\mathbf{w_2} &= \begin{bmatrix}
W_3& W_4
\end{bmatrix}\\
\end{align*}
$$
$$
\begin{align*}
\mathbf{x} &= \begin{bmatrix}
x_1\\
x_2
\end{bmatrix}\\
\mathbf{w_1} &= \begin{bmatrix}
W_1& W_2
\end{bmatrix}\\
\mathbf{w_2} &= \begin{bmatrix}
W_3& W_4
\end{bmatrix}\\
\mathbf{z_2} &= \begin{bmatrix}
{z}'\\
{z}''
\end{bmatrix}\\
\end{align*}
$$
$$
\begin{align*}
\frac{\partial C}{\partial \mathbf{w_1}} &= \frac{\partial z}{\partial \mathbf{w_1}}\frac{\partial C}{\partial z}\\
&= \begin{bmatrix}
\frac{\partial z}{\partial W_1}& \frac{\partial z}{\partial W_2}
\end{bmatrix} \frac{\partial a}{\partial z}\frac{\partial C}{\partial a} \\
&= \mathbf{x}^T {f}'(z)\frac{\partial \mathbf{z_2}}{\partial a}\frac{\partial C}{\partial \mathbf{z_2}} \\
&= \mathbf{x}^T {f}'(z)\begin{bmatrix}
\frac{\partial {z}'}{\partial a} &
\frac{\partial {z}''}{\partial a}
\end{bmatrix} \frac{\partial C}{\partial \mathbf{z_2}} \\
&= \mathbf{x}^T {f}'(z)\mathbf{w_2} \begin{bmatrix}
\frac{\partial C}{\partial {z}'}\\
\frac{\partial C}{\partial {z}''}
\end{bmatrix} \\
\end{align*}
$$
參考
深度學習的數學地圖:用 Python 實作神經網路的數學模型
機器學習的數學基礎 : AI、深度學習打底必讀
留言
張貼留言