數學知識:矩陣微分
線代啟示錄
Wiki Matrix calculus
簡介:矩陣微分定義與基本性質
定義:scalar by vector 的導數
假設 \(f\) 為 function,且擁有 \(p\) 個獨立變數 \(x_1,x_2,\cdots,x_p\)
令 \(\mathbf{x} = [x_1,x_2,\cdots,x_p]^{T} \)
\(\frac{\partial f}{\partial \mathbf{x}} = \left [\frac{\partial f}{\partial x_1},\frac{\partial f}{\partial x_2},\cdots,\frac{\partial f}{\partial x_p}\right ]^T\)
定理(1):\(\frac{\partial \mathbf{x}^T\mathbf{x}}{\partial \mathbf{x}}=2\mathbf{x}\)
\(Proof:\)
$$
\begin{align*}
\frac{\partial \mathbf{x}^T\mathbf{x}}{\partial \mathbf{x}}&=[\frac{\partial \mathbf{x}^T\mathbf{x}}{\partial x_1},\frac{\partial \mathbf{x}^T\mathbf{x}}{\partial x_1},\frac{\partial \mathbf{x}^T\mathbf{x}}{\partial x_p}]^T
\\&=[\frac{\partial \sum_{i=1}^{n}x_i^2}{\partial x_1},\frac{\partial \sum_{i=1}^{n}x_i^2}{\partial x_2},\cdots,\frac{\partial \sum_{i=1}^{n}x_i^2}{\partial x_p}]^T \\ &=[2x_1,2x_2,\cdots,2x_p]^T
\\ &=2[x_1,x_2,\cdots,x_p]^T
\\ &=2\mathbf{x}
\end{align*}$$
定理(2):\(\frac{\partial \mathbf{x}^T\mathbf{y}}{\partial \mathbf{x}}=\mathbf{y}\Leftrightarrow \frac{\partial xy}{\partial x}=y\)
\(Proof:\)
$$
\begin{align*}
\frac{\partial \mathbf{x}^T\mathbf{y}}{\partial \mathbf{x}}&=[\frac{\partial \mathbf{x}^T\mathbf{y}}{\partial x_1},\frac{\partial \mathbf{x}^T\mathbf{y}}{\partial x_2},\cdots,\frac{\partial \mathbf{x}^T\mathbf{y}}{\partial x_p}]\\
&=[\frac{\partial \sum_{i=1}^{p}x_iy_i}{\partial x_1},\frac{\partial \sum_{i=1}^{p}x_iy_i}{\partial x_2},\cdots,\frac{\partial \sum_{i=1}^{p}x_iy_i}{\partial x_p}]^T\\
&=[y_1,y_2,\cdots,y_p]^T \\
&=\mathbf{y}
\end{align*}$$
定理(3):\(\frac{\partial \mathbf{A}^T\mathbf{x}}{\partial \mathbf{x}}=\mathbf{A}\Leftrightarrow \frac{\partial ax}{\partial x}=a\)
\(Proof:\)
$$
\begin{align*}
Let~ \mathbf{A}&=[a_{ij}]_{p\times p} \\
\frac{\partial \mathbf{Ax}}{\partial \mathbf{x}}&=\frac{\partial }{\partial \mathbf{x}}\mathbf{Ax}\\
&=\frac{\partial }{\partial \mathbf{x}}\left [
\sum_{j=1}^{p}a_{1j}x_j,\sum_{j=1}^{p}a_{2j}x_j,\cdots,\sum_{j=1}^{p}a_{pj}x_j
\right ]^T\\
&=\begin{bmatrix}
\frac{\partial }{\partial \mathbf{x}}\sum_{j=1}^{p}a_{1j}x_j
\\ \frac{\partial }{\partial \mathbf{x}}\sum_{j=1}^{p}a_{2j}x_j
\\ \vdots
\\ \frac{\partial }{\partial \mathbf{x}}\sum_{j=1}^{p}a_{pj}x_j
\end{bmatrix}\\
&=\begin{bmatrix}
\frac{\partial }{\partial x_1}\sum_{j=1}^{p}a_{1j}x_j & \frac{\partial }{\partial x_2}\sum_{j=1}^{p}a_{1j}x_j & \cdots & \frac{\partial }{\partial x_p}\sum_{j=1}^{p}a_{1j}x_j\\
\frac{\partial }{\partial x_1}\sum_{j=1}^{p}a_{2j}x_j & \frac{\partial }{\partial x_2}\sum_{j=1}^{p}a_{2j}x_j & \cdots & \frac{\partial }{\partial x_p}\sum_{j=1}^{p}a_{2j}x_j\\
\vdots & \vdots & \ddots & \vdots \\
\frac{\partial }{\partial x_1}\sum_{j=1}^{p}a_{pj}x_j & \frac{\partial }{\partial x_2}\sum_{j=1}^{p}a_{pj}x_j & \cdots & \frac{\partial }{\partial x_p}\sum_{j=1}^{p}a_{pj}x_j
\end{bmatrix}\\
&=\mathbf{A}
\end{align*}$$
定理(4):\((i)\frac{\partial \mathbf{x}^T\mathbf{A}\mathbf{x}}{\partial \mathbf{x}}=(\mathbf{A}+\mathbf{A}^T)\mathbf{x} \quad (ii)If~ \mathbf{A}=\mathbf{A}^T\Rightarrow \frac{\partial \mathbf{x}^T\mathbf{A}\mathbf{x}}{\partial \mathbf{x}}=2\mathbf{A}\mathbf{x}\Leftrightarrow \frac{\partial }{\partial x}ax^2=2x\)
\(Proof:\)
$$
\begin{align*}
Let~ \mathbf{A}&=[a_{ij}]_{p\times p}
\\f(\mathbf{x})&=\mathbf{x}^T\mathbf{A}\mathbf{x}=\sum_{i=1}^{p}\sum_{j=1}^{p}x_ia_{ij}x_j \\
\frac{\partial f}{\partial x_i}&=\frac{\partial}{\partial x_i}\left [ \sum_{i=1}^{p}\sum_{j=1}^{p}x_ia_{ij}x_j \right ]
\\&=\sum_{j=1}^{p}a_{ij}x_j+\sum_{j=1}^{p}x_ja_{ji}
\\&=\sum_{j=1}^{p}a_{ij}x_j+\sum_{j=1}^{p}a_{ji}x_j
\\\Rightarrow &\frac{\partial f}{\partial \mathbf{x}}
\\ &=\mathbf{A}\mathbf{x}+\mathbf{A}^T\mathbf{x}\\
&=(\mathbf{A}+\mathbf{A}^T)\mathbf{x}
\end{align*}$$
乘法律:
若 \(h(x) = f(x)g(x)\),\(f(x)\) 和 \(g(x)\) 在 \(x\) 都是可微
\(h'(x) = f'(x)g(x) + f (x)g'(x)\)
定義:scalar by matrix 的導數
假設 \(f\) 為 function,且擁有 \(m \times n\) matrix \(\mathbf{X}\) 變數
$$
\mathbf{X_{m\times n}}=
\begin{bmatrix}
x_{11} & x_{12} & \cdots & x_{1n}\\
x_{21} & x_{22} & \cdots & x_{2n}\\
\vdots & \vdots & \ddots & \vdots\\
x_{m1} & x_{m2} & \cdots & x_{mn}\\
\end{bmatrix}
$$
假設所有 \(\frac{\partial f}{\partial x_{ij}}\) 皆存在
$$
\frac{\partial f}{\partial \mathbf{X}}=
\begin{bmatrix}
\frac{\partial f}{\partial x_{11}} & \frac{\partial f}{\partial x_{12}} & \cdots & \frac{\partial f}{\partial x_{1n}}\\
\frac{\partial f}{\partial x_{21}} & \frac{\partial f}{\partial x_{22}} & \cdots & \frac{\partial f}{\partial x_{2n}}\\
\vdots & \vdots & \ddots & \vdots\\
\frac{\partial f}{\partial x_{m1}} & \frac{\partial f}{\partial x_{m2}} & \cdots & \frac{\partial f}{\partial x_{mn}}\\
\end{bmatrix}_{m\times n}
$$
定義:matrix by scalar 的導數
假設 \(\mathbf{F}\) 為 matrix function,且擁有 \(x\) 變數
且所有 \(\frac{\partial f_{ij}}{\partial x}\) 皆存在
$$
\frac{\partial \mathbf{F}}{\partial x}=
\begin{bmatrix}
\frac{\partial f_{11}}{\partial x} & \frac{\partial f_{12}}{\partial x} & \cdots & \frac{\partial f_{1n}}{\partial x}\\
\frac{\partial f_{21}}{\partial x} & \frac{\partial f_{22}}{\partial x} & \cdots & \frac{\partial f_{2n}}{\partial x}\\
\vdots & \vdots & \ddots & \vdots\\
\frac{\partial f_{m1}}{\partial x} & \frac{\partial f_{m2}}{\partial x} & \cdots & \frac{\partial f_{mn}}{\partial x}\\
\end{bmatrix}_{m\times n}
$$
定理(5):\(\mathbf{X}_{n \times p} \Rightarrow \frac{\partial tr(\mathbf{X}^T\mathbf{X})}{\partial \mathbf{X}}=2\mathbf{X}\)
\(Proof:\)
$$
\begin{align*}
\mathbf{X}^T\mathbf{X}&=
\begin{bmatrix}
x_{11} & x_{12} & \cdots & x_{1p} \\
x_{21} & x_{22} & \cdots & x_{2p} \\
\vdots & \vdots & \ddots & \vdots \\
x_{n1} & x_{n2} & \cdots & x_{np}
\end{bmatrix}^T
\begin{bmatrix}
x_{11} & x_{12} & \cdots & x_{1p}\\
x_{21} & x_{22} & \cdots & x_{2p}\\
\vdots & \vdots & \ddots & \vdots \\
x_{n1} & x_{n2} & \cdots & x_{np}
\end{bmatrix}
\\&=
\begin{bmatrix}
\sum_{i=1}^{n}x_{i1}x_{i1} &
\sum_{i=1}^{n}x_{i1}x_{i2} & \cdots &
\sum_{i=1}^{n}x_{i1}x_{ip}\\
\sum_{i=1}^{n}x_{i2}x_{i1} &
\sum_{i=1}^{n}x_{i2}x_{i2} & \cdots &
\sum_{i=1}^{n}x_{i2}x_{ip}\\
\vdots & \vdots & \ddots & \vdots \\
\sum_{i=1}^{n}x_{ip}x_{i1} &
\sum_{i=1}^{n}x_{ip}x_{i2} & \cdots &
\sum_{i=1}^{n}x_{ip}x_{ip}
\end{bmatrix}\\
\\tr(\mathbf{X}^T\mathbf{X})&=\left (
\sum_{i=1}^{n}x_{i1}x_{i1}+
\sum_{i=1}^{n}x_{i2}x_{i2}+\cdots+
\sum_{i=1}^{n}x_{ip}x_{ip}
\right )
\\&=\sum_{i=1}^{n}\sum_{j=1}^{p}x_{ij}^2\\
\\\frac{\partial tr(\mathbf{X}^T\mathbf{X})}{\partial x_{ij}}
&=\frac{\partial \sum_{i=1}^{n}\sum_{j=1}^{p}x_{ij}^2}{\partial x_{ij}}
\\&=2x_{ij}
\\\Rightarrow
\frac{\partial tr(\mathbf{X}^T\mathbf{X})}{\partial \mathbf{X}}&=2\mathbf{X}
\end{align*}
$$
定理(6):\(\mathbf{A}_{n \times p},\mathbf{X}_{n \times p} \Rightarrow \frac{\partial tr(\mathbf{A}^T\mathbf{X})}
{\partial \mathbf{X}} = \mathbf{A}\)
\(Proof:\)
$$
\begin{align*}
\mathbf{A}^T\mathbf{X}&=
\begin{bmatrix}
a_{11} & a_{21} & \cdots & a_{n1} \\
a_{12} & a_{22} & \cdots & a_{n1} \\
\vdots & \vdots & \ddots & \vdots \\
a_{1p} & a_{2p} & \cdots & a_{np}
\end{bmatrix}
\begin{bmatrix}
x_{11} & x_{12} & \cdots & x_{1p} \\
x_{21} & x_{22} & \cdots & x_{2p} \\
\vdots & \vdots & \ddots & \vdots \\
x_{n1} & x_{n2} & \cdots & x_{np}
\end{bmatrix}\\
&=\begin{bmatrix}
\sum_{i=1}^{n}a_{i1}x_{i1} &
\sum_{i=1}^{n}a_{i1}x_{i2} & \cdots &
\sum_{i=1}^{n}a_{i1}x_{ip}\\
\sum_{i=1}^{n}a_{i2}x_{i1} &
\sum_{i=1}^{n}a_{i2}x_{i2} & \cdots &
\sum_{i=1}^{n}a_{i2}x_{ip}\\
\vdots & \vdots & \ddots & \vdots \\
\sum_{i=1}^{n}a_{ip}x_{i1} &
\sum_{i=1}^{n}a_{ip}x_{i2} & \cdots &
\sum_{i=1}^{n}a_{ip}x_{ip}
\end{bmatrix} \\\\
&tr(\mathbf{A}^T\mathbf{X})=\sum_{i=1}^{n}\sum_{j=1}^{p}a_{ij}x_{ij}\\
&\frac{\partial tr(\mathbf{A}^T\mathbf{X})}{\partial x_{ij}}=a_{ij}
\Rightarrow \frac{\partial tr(\mathbf{A}^T\mathbf{X})}{\partial \mathbf{X}}=\mathbf{A}
\end{align*}$$
定理(7):\(\mathbf{A}_{n \times n},\mathbf{X}_{n \times p},\mathbf{B}_{p \times n}
\Rightarrow \frac{\partial tr(\mathbf{A}^T\mathbf{X}\mathbf{B})}
{\partial \mathbf{X}} = \mathbf{AB}^T\)
\(Proof:\)
$$
\begin{align*}
\mathbf{A}^T\mathbf{XB}&=
\begin{bmatrix}
a_{11} & a_{12} & \cdots & a_{1n} \\
a_{21} & a_{22} & \cdots & a_{2n} \\
\vdots & \vdots & \ddots & \vdots \\
a_{n1} & a_{n2} & \cdots & a_{nn}
\end{bmatrix}^T
\begin{bmatrix}
x_{11} & x_{12} & \cdots & x_{1p} \\
x_{21} & x_{22} & \cdots & x_{2p} \\
\vdots & \vdots & \ddots & \vdots \\
x_{n1} & x_{n2} & \cdots & x_{np}
\end{bmatrix}
\begin{bmatrix}
b_{11} & b_{12} & \cdots & b_{1n} \\
b_{21} & b_{22} & \cdots & b_{2n} \\
\vdots & \vdots & \ddots & \vdots \\
b_{p1} & b_{p2} & \cdots & b_{pn}
\end{bmatrix}\\
&=\begin{bmatrix}
a_{11} & a_{21} & \cdots & a_{n1} \\
a_{12} & a_{22} & \cdots & a_{n2} \\
\vdots & \vdots & \ddots & \vdots \\
a_{1n} & a_{2n} & \cdots & a_{nn}
\end{bmatrix}
\begin{bmatrix}
x_{11} & x_{12} & \cdots & x_{1p} \\
x_{21} & x_{22} & \cdots & x_{2p} \\
\vdots & \vdots & \ddots & \vdots \\
x_{n1} & x_{n2} & \cdots & x_{np}
\end{bmatrix}
\begin{bmatrix}
b_{11} & b_{12} & \cdots & b_{1n} \\
b_{21} & b_{22} & \cdots & b_{2n} \\
\vdots & \vdots & \ddots & \vdots \\
b_{p1} & b_{p2} & \cdots & b_{pn}
\end{bmatrix}\\
&=\begin{bmatrix}
\sum_{i=1}^{n}a_{i1}x_{i1} &
\sum_{i=1}^{n}a_{i1}x_{i2} & \cdots &
\sum_{i=1}^{n}a_{i1}x_{ip}\\
\sum_{i=1}^{n}a_{i2}x_{i1} &
\sum_{i=1}^{n}a_{i2}x_{i2} & \cdots &
\sum_{i=1}^{n}a_{i2}x_{ip}\\
\vdots & \vdots & \ddots & \vdots \\
\sum_{i=1}^{n}a_{in}x_{i1} &
\sum_{i=1}^{n}a_{in}x_{i2} & \cdots &
\sum_{i=1}^{n}a_{in}x_{ip}
\end{bmatrix}
\begin{bmatrix}
b_{11} & b_{12} & \cdots & b_{1n} \\
b_{21} & b_{22} & \cdots & b_{2n} \\
\vdots & \vdots & \ddots & \vdots \\
b_{p1} & b_{p2} & \cdots & b_{pn}
\end{bmatrix}\\
&=\begin{bmatrix}
\sum_{j=1}^{p}\sum_{i=1}^{n}a_{i1}x_{ij}b_{j1} &
\sum_{j=1}^{p}\sum_{i=1}^{n}a_{i1}x_{ij}b_{j2} & \cdots &
\sum_{j=1}^{p}\sum_{i=1}^{n}a_{i1}x_{ij}b_{jn}\\
\sum_{j=1}^{p}\sum_{i=1}^{n}a_{i2}x_{ij}b_{j1} &
\sum_{j=1}^{p}\sum_{i=1}^{n}a_{i2}x_{ij}b_{j2} & \cdots &
\sum_{j=1}^{p}\sum_{i=1}^{n}a_{i2}x_{ij}b_{jn}\\
\vdots & \vdots & \ddots & \vdots \\
\sum_{j=1}^{p}\sum_{i=1}^{n}a_{in}x_{ij}b_{j1} &
\sum_{j=1}^{p}\sum_{i=1}^{n}a_{in}x_{ij}b_{j2} & \cdots &
\sum_{j=1}^{p}\sum_{i=1}^{n}a_{in}x_{ij}b_{jn}
\end{bmatrix}\\\\
&tr(\mathbf{AXB})=\sum_{k=1}^{n}\sum_{j=1}^{p}\sum_{i=1}^{n}a_{ik}x_{ij}b_{jk}\\
&\frac{\partial tr(\mathbf{AXB})}{\partial x_{ij}}
=\sum_{k=1}^{n}a_{ik}b_{jk}
\Rightarrow \frac{\partial tr(\mathbf{AXB})}{\partial \mathbf{X}}
=\mathbf{A}\mathbf{B}^T\\\\
&\mathbf{A}\mathbf{B}^T
=\begin{bmatrix}
\sum_{i=1}^{n}a_{1i}b_{1i} &
\sum_{i=1}^{n}a_{1i}b_{2i} & \cdots &
\sum_{i=1}^{n}a_{1i}b_{pi}\\
\sum_{i=1}^{n}a_{2i}b_{1i} &
\sum_{i=1}^{n}a_{2i}b_{2i} & \cdots &
\sum_{i=1}^{n}a_{2i}b_{pi}\\
\vdots & \vdots & \ddots & \vdots \\
\sum_{i=1}^{n}a_{ni}b_{1i} &
\sum_{i=1}^{n}a_{ni}b_{2i} & \cdots &
\sum_{i=1}^{n}a_{ni}b_{pi}
\end{bmatrix}
\end{align*}$$
定理(8):\(\mathbf{A}_{n \times n},\mathbf{X}_{n \times p}
\Rightarrow \frac{\partial tr(\mathbf{X}^T\mathbf{A}\mathbf{X})}
{\partial \mathbf{X}} = (\mathbf{A}+\mathbf{A}^T)\mathbf{X}\)
\(Proof:\)
$$
\begin{align*}
\mathbf{X}^T\mathbf{AX}&=
\begin{bmatrix}
x_{11} & x_{12} & \cdots & x_{1p}\\
x_{21} & x_{22} & \cdots & x_{2p}\\
\vdots & \vdots & \ddots & \vdots \\
x_{n1} & x_{n2} & \cdots & x_{np}
\end{bmatrix}^T
\begin{bmatrix}
a_{11} & a_{12} & \cdots & a_{1n}\\
a_{21} & a_{22} & \cdots & a_{2n}\\
\vdots & \vdots & \ddots & \vdots \\
a_{n1} & a_{n2} & \cdots & a_{nn}
\end{bmatrix}
\begin{bmatrix}
x_{11} & x_{12} & \cdots & x_{1p}\\
x_{21} & x_{22} & \cdots & x_{2p}\\
\vdots & \vdots & \ddots & \vdots \\
x_{n1} & x_{n2} & \cdots & x_{np}
\end{bmatrix}\\
&=\begin{bmatrix}
x_{11} & x_{21} & \cdots & x_{n1}\\
x_{12} & x_{22} & \cdots & x_{n2}\\
\vdots & \vdots & \ddots & \vdots \\
x_{1p} & x_{2p} & \cdots & x_{np}
\end{bmatrix}
\begin{bmatrix}
a_{11} & a_{12} & \cdots & a_{1n}\\
a_{21} & a_{22} & \cdots & a_{2n}\\
\vdots & \vdots & \ddots & \vdots \\
a_{n1} & a_{n2} & \cdots & a_{nn}
\end{bmatrix}
\begin{bmatrix}
x_{11} & x_{12} & \cdots & x_{1p}\\
x_{21} & x_{22} & \cdots & x_{2p}\\
\vdots & \vdots & \ddots & \vdots \\
x_{n1} & x_{n2} & \cdots & x_{np}
\end{bmatrix}\\
&=\begin{bmatrix}
\sum_{i=1}^{n}x_{i1}a_{i1} &
\sum_{i=1}^{n}x_{i1}a_{i2} & \cdots &
\sum_{i=1}^{n}x_{i1}a_{in}\\
\sum_{i=1}^{n}x_{i2}a_{i1} &
\sum_{i=1}^{n}x_{i2}a_{i2} & \cdots &
\sum_{i=1}^{n}x_{i2}a_{ip}\\
\vdots & \vdots & \ddots & \vdots \\
\sum_{i=1}^{n}x_{ip}a_{i1} &
\sum_{i=1}^{n}x_{ip}a_{i2} & \cdots &
\sum_{i=1}^{n}x_{ip}a_{in}
\end{bmatrix}
\begin{bmatrix}
x_{11} & x_{12} & \cdots & x_{1p}\\
x_{21} & x_{22} & \cdots & x_{2p}\\
\vdots & \vdots & \ddots & \vdots \\
x_{n1} & x_{n2} & \cdots & x_{np}
\end{bmatrix}\\
&=\begin{bmatrix}
\sum_{j=1}^{n}\sum_{i=1}^{n}x_{i1}a_{ij}x_{j1} &
\sum_{j=1}^{n}\sum_{i=1}^{n}x_{i1}a_{ij}x_{j2} & \cdots &
\sum_{j=1}^{n}\sum_{i=1}^{n}x_{i1}a_{ij}x_{jp} \\
\sum_{j=1}^{n}\sum_{i=1}^{n}x_{i2}a_{ij}x_{j1} &
\sum_{j=1}^{n}\sum_{i=1}^{n}x_{i2}a_{ij}x_{j2} & \cdots &
\sum_{j=1}^{n}\sum_{i=1}^{n}x_{i2}a_{ij}x_{jp} \\
\vdots & \vdots & \ddots & \vdots \\
\sum_{j=1}^{n}\sum_{i=1}^{n}x_{ip}a_{ij}x_{j1} &
\sum_{j=1}^{n}\sum_{i=1}^{n}x_{ip}a_{ij}x_{j2} & \cdots &
\sum_{j=1}^{n}\sum_{i=1}^{n}x_{ip}a_{ij}x_{jp}
\end{bmatrix}\\\\
\end{align*}
$$
$$
\begin{align*}
tr(\mathbf{X}^T\mathbf{A}\mathbf{X})
&=\sum_{k=1}^{p}\sum_{j=1}^{n}\sum_{i=1}^{n}x_{ip}a_{ij}x_{jp} \\
\frac{\partial tr(\mathbf{X}^T\mathbf{A}\mathbf{X})}{\partial x_{ip}}&=\sum_{j=1}^{n}a_{ij}x_{jp} +\sum_{j=1}^{n}x_{jp}a_{ji}\\
&=\sum_{j=1}^{n}a_{ij}x_{jp} +\sum_{j=1}^{n}a_{ji}x_{jp}\\
\Rightarrow \frac{\partial tr(\mathbf{X}^T\mathbf{A}\mathbf{X})}{\partial \mathbf{X}}&=(\mathbf{A}+\mathbf{A}^T)\mathbf{X}
\end{align*}
$$乘法律:
若 \(h(x) = f(x)g(x)\),\(f(x)\) 和 \(g(x)\) 在 \(x\) 都是可微
\(h'(x) = f'(x)g(x) + f (x)g'(x)\)
定理(9):\(\mathbf{A}_{p \times p},\mathbf{X}_{n \times p}
\Rightarrow \frac{\partial tr(\mathbf{X}\mathbf{A}\mathbf{X}^T)}
{\partial \mathbf{X}} = \mathbf{X}(\mathbf{A}+\mathbf{A}^T)\)
\(Proof:\)
$$
\begin{align*}
\mathbf{XA}\mathbf{X}^T&=
\begin{bmatrix}
x_{11} & x_{12} & \cdots & x_{1p}\\
x_{21} & x_{22} & \cdots & x_{2p}\\
\vdots & \vdots & \ddots & \vdots \\
x_{n1} & x_{n2} & \cdots & x_{np}
\end{bmatrix}
\begin{bmatrix}
a_{11} & a_{12} & \cdots & a_{1p}\\
a_{21} & a_{22} & \cdots & a_{2p}\\
\vdots & \vdots & \ddots & \vdots \\
a_{p1} & a_{p2} & \cdots & a_{pp}
\end{bmatrix}
\begin{bmatrix}
x_{11} & x_{12} & \cdots & x_{1p}\\
x_{21} & x_{22} & \cdots & x_{2p}\\
\vdots & \vdots & \ddots & \vdots \\
x_{n1} & x_{n2} & \cdots & x_{np}
\end{bmatrix}^T\\
&=\begin{bmatrix}
x_{11} & x_{12} & \cdots & x_{1p}\\
x_{21} & x_{22} & \cdots & x_{2p}\\
\vdots & \vdots & \ddots & \vdots \\
x_{n1} & x_{n2} & \cdots & x_{np}
\end{bmatrix}
\begin{bmatrix}
a_{11} & a_{12} & \cdots & a_{1p}\\
a_{21} & a_{22} & \cdots & a_{2p}\\
\vdots & \vdots & \ddots & \vdots \\
a_{p1} & a_{p2} & \cdots & a_{pp}
\end{bmatrix}
\begin{bmatrix}
x_{11} & x_{21} & \cdots & x_{n1}\\
x_{12} & x_{22} & \cdots & x_{n2}\\
\vdots & \vdots & \ddots & \vdots \\
x_{1p} & x_{2p} & \cdots & x_{np}
\end{bmatrix}\\
&=\begin{bmatrix}
\sum_{i=1}^{p}x_{1i}a_{i1} &
\sum_{i=1}^{p}x_{1i}a_{i2} & \cdots &
\sum_{i=1}^{p}x_{1i}a_{ip}\\
\sum_{i=1}^{p}x_{2i}a_{i1} &
\sum_{i=1}^{p}x_{2i}a_{i2} & \cdots &
\sum_{i=1}^{p}x_{2i}a_{ip}\\
\vdots & \vdots & \ddots & \vdots \\
\sum_{i=1}^{p}x_{ni}a_{i1} &
\sum_{i=1}^{p}x_{ni}a_{i2} & \cdots &
\sum_{i=1}^{p}x_{ni}a_{ip}
\end{bmatrix}
\begin{bmatrix}
x_{11} & x_{21} & \cdots & x_{n1}\\
x_{12} & x_{22} & \cdots & x_{n2}\\
\vdots & \vdots & \ddots & \vdots \\
x_{1p} & x_{2p} & \cdots & x_{np}
\end{bmatrix}\\
&=\begin{bmatrix}
\sum_{j=1}^{p}\sum_{i=1}^{p}x_{1i}a_{ij}x_{1j} &
\sum_{j=1}^{p}\sum_{i=1}^{p}x_{1i}a_{ij}x_{2j} & \cdots &
\sum_{j=1}^{p}\sum_{i=1}^{p}x_{1i}a_{ij}x_{pj} \\
\sum_{j=1}^{p}\sum_{i=1}^{p}x_{2i}a_{ij}x_{1j} &
\sum_{j=1}^{p}\sum_{i=1}^{p}x_{2i}a_{ij}x_{2j} & \cdots &
\sum_{j=1}^{p}\sum_{i=1}^{p}x_{2i}a_{ij}x_{pj} \\
\vdots & \vdots & \ddots & \vdots \\
\sum_{j=1}^{p}\sum_{i=1}^{p}x_{pi}a_{ij}x_{1j} &
\sum_{j=1}^{p}\sum_{i=1}^{p}x_{pi}a_{ij}x_{2j} & \cdots &
\sum_{j=1}^{p}\sum_{i=1}^{p}x_{ni}a_{ij}x_{nj}
\end{bmatrix}\\\\
\end{align*}
$$
$$
\begin{align*}
tr(\mathbf{X}\mathbf{A}\mathbf{X}^T)
&=\sum_{k=1}^{n}\sum_{j=1}^{p}\sum_{i=1}^{p}x_{ki}a_{ij}x_{kj} \\
\frac{\partial tr(\mathbf{X}\mathbf{A}\mathbf{X}^T)}{\partial x_{ki}}&=\sum_{j=1}^{p}a_{ij}x_{kj} +\sum_{j=1}^{p}x_{kj}a_{ji}\\
&=\sum_{j=1}^{p}x_{kj}a_{ji}+\sum_{j=1}^{p}x_{kj}a_{ij}\\
\Rightarrow \frac{\partial tr(\mathbf{X}^T\mathbf{A}\mathbf{X})}{\partial \mathbf{X}}
&=\mathbf{X}(\mathbf{A}+\mathbf{A}^T)
\end{align*}
$$
乘法律:
若 \(h(x) = f(x)g(x)\),\(f(x)\) 和 \(g(x)\) 在 \(x\) 都是可微
\(h'(x) = f'(x)g(x) + f (x)g'(x)\)
$$
定理(10):\(\mathbf{X}_{p \times p}\Rightarrow \frac{\partial |\mathbf{X}|}{\partial \mathbf{X}}=|\mathbf{X}|(\mathbf{X}^{-1})^T\)
\(Proof:\)
$$
\begin{align*}
|\mathbf{X}|=\det \mathbf{X}&=\sum_{j=1}^nx_{ij}(-1)^{i+j}\det\mathbf{X}_{ij}\\
&=\sum_{j=1}^nx_{ij}c_{ij}\\\\
\frac{\partial \det \mathbf{X}}{\partial x_{ij}}&=\frac{\partial\sum_{k}x_{ik}c_{ik}}{\partial x_{ij}}\\
&=c_{ij}\\
&=\left((\hbox{adj}X)^T\right)_{ij}\\
&=\left((\det X)(X^{-1})^T\right)_{ij}\\
\Rightarrow \frac{\partial |\mathbf{X}|}{\partial \mathbf{X}}&=|\mathbf{X}|(\mathbf{X}^{-1})^T\\\\
\end{align*}
$$
----------------------------------------------------------------------------------------------------------------------------------------
$$
\det \mathbf{X}=\sum_{j=1}^nx_{ij}c_{ij}=\sum_{j=1}^n(-1)^{i+j}x_{ij}\det\tilde{\mathbf{X}}_{ij}\\
\begin{bmatrix}
x_{11}&x_{12}&\cdots&x_{1n}\\
x_{21}&x_{22}&\cdots&x_{2n}\\
\vdots&\vdots&\ddots&\vdots\\
x_{n1}&x_{n2}&\cdots&x_{nn}
\end{bmatrix}
\begin{bmatrix}
c_{11}&c_{21}&\cdots&c_{n1}\\
c_{12}&c_{22}&\cdots&c_{n2}\\
\vdots&\vdots&\ddots&\vdots\\
c_{1n}&c_{2n}&\cdots&c_{nn}
\end{bmatrix}=
\begin{bmatrix}
\det \mathbf{X}&0&\cdots&0\\
0&\det \mathbf{X}&\cdots&0\\
\vdots&\vdots&\ddots&\vdots\\
0&0&\cdots&\det \mathbf{X}
\end{bmatrix}\\
\mathbf{X}\mathbf{C}^T=(\det \mathbf{X})\mathbf{I}
$$
\(\mathbf{C}^T\) 為 \(\mathbf{X}\) 的伴隨矩陣 (adjugate 或 classical adjoint),記作 \(\mathrm{adj}\mathbf{X}\)
若 \(\mathbf{X}\) 可逆,則 \(\mathbf{C}^T=(\det \mathbf{X})\mathbf{X}^{-1}\)
為何只有對角線為 \(\det\mathbf{X}\),其餘為 0?
如果 \(i\neq j\),那麼 \(\mathbf{X}\mathbf{C}^T\) 的第 \(i\) 行第 \(j\) 列的係數是 \(\sum_{k=1}^{n} x_{ik} c_{jk}\)
拉普拉斯公式說明這個和等於 0
(等同把 \(\mathbf{X}\) 的第 \(j\) 行元素換成第 \(i\) 行元素後求行列式。
由於有兩行相同,行列式為 0)。
以 \(\mathbf{X}\) 的第 2 列和 \(\mathbf{C}^T\) 的第 1 行相乘為例:
\(x_{21}\) 和 \(c_{11}\) 思考如圖,左為目前的做法,右為正常的做法
$$
\sum_{k=1}^{n} x_{2k} c_{1k} \Rightarrow
\det\begin{bmatrix}
x_{21}&x_{22}&\cdots&x_{2n}\\
x_{21}&x_{22}&\cdots&x_{2n}\\
\vdots&\vdots&\ddots&\vdots\\
x_{n1}&x_{n2}&\cdots&x_{nn}
\end{bmatrix}=0
$$
定理(11):\(\mathbf{X}_{p \times p}\Rightarrow \frac{\partial \ln|\mathbf{X}|}{\partial \mathbf{X}}=(\mathbf{X}^{-1})^T\)
\(Proof:\)
根據 定理(10)
$$
\begin{align*}
\frac{\partial \ln\mathbf{|X|}}{\partial x}
&=\frac{1}{\mathbf{|X|}}\frac{\partial }{\partial \mathbf{X}}\mathbf{|X|}\\
&=\frac{1}{\mathbf{|X|}}\mathbf{|X|}(\mathbf{X}^{-1})^T\\
&=(\mathbf{X}^{-1})^T
\end{align*}
$$
Chain Rule 連鎖律:
設 \(y = f (u)\) 可以對 \(u\) 微分,而函數 \(u = g(x)\) 是可以對 \(x\) 微分
則 \(y = f (g(x))\) 是可以對 \(x\) 微分的
同時 $$\frac{\mathrm{d} y}{\mathrm{d} x}=\frac{\mathrm{d} y}{\mathrm{d} u}\frac{\mathrm{d} u}{\mathrm{d} x}$$
留言
張貼留言