[Math] 矩陣微分

數學知識:矩陣微分
線代啟示錄
Wiki Matrix calculus

簡介:矩陣微分定義與基本性質

定義:scalar by vector 的導數

假設 \(f\) 為 function,且擁有 \(p\) 個獨立變數 \(x_1,x_2,\cdots,x_p\)
令 \(\mathbf{x} = [x_1,x_2,\cdots,x_p]^{T} \)
\(\frac{\partial f}{\partial \mathbf{x}} = \left [\frac{\partial f}{\partial x_1},\frac{\partial f}{\partial x_2},\cdots,\frac{\partial f}{\partial x_p}\right ]^T\)

定理(1):\(\frac{\partial \mathbf{x}^T\mathbf{x}}{\partial \mathbf{x}}=2\mathbf{x}\)

\(Proof:\) $$ \begin{align*} \frac{\partial \mathbf{x}^T\mathbf{x}}{\partial \mathbf{x}}&=[\frac{\partial \mathbf{x}^T\mathbf{x}}{\partial x_1},\frac{\partial \mathbf{x}^T\mathbf{x}}{\partial x_1},\frac{\partial \mathbf{x}^T\mathbf{x}}{\partial x_p}]^T \\&=[\frac{\partial \sum_{i=1}^{n}x_i^2}{\partial x_1},\frac{\partial \sum_{i=1}^{n}x_i^2}{\partial x_2},\cdots,\frac{\partial \sum_{i=1}^{n}x_i^2}{\partial x_p}]^T \\ &=[2x_1,2x_2,\cdots,2x_p]^T \\ &=2[x_1,x_2,\cdots,x_p]^T \\ &=2\mathbf{x} \end{align*}$$

定理(2):\(\frac{\partial \mathbf{x}^T\mathbf{y}}{\partial \mathbf{x}}=\mathbf{y}\Leftrightarrow \frac{\partial xy}{\partial x}=y\)

\(Proof:\) $$ \begin{align*} \frac{\partial \mathbf{x}^T\mathbf{y}}{\partial \mathbf{x}}&=[\frac{\partial \mathbf{x}^T\mathbf{y}}{\partial x_1},\frac{\partial \mathbf{x}^T\mathbf{y}}{\partial x_2},\cdots,\frac{\partial \mathbf{x}^T\mathbf{y}}{\partial x_p}]\\ &=[\frac{\partial \sum_{i=1}^{p}x_iy_i}{\partial x_1},\frac{\partial \sum_{i=1}^{p}x_iy_i}{\partial x_2},\cdots,\frac{\partial \sum_{i=1}^{p}x_iy_i}{\partial x_p}]^T\\ &=[y_1,y_2,\cdots,y_p]^T \\ &=\mathbf{y} \end{align*}$$

定理(3):\(\frac{\partial \mathbf{A}^T\mathbf{x}}{\partial \mathbf{x}}=\mathbf{A}\Leftrightarrow \frac{\partial ax}{\partial x}=a\)

\(Proof:\) $$ \begin{align*} Let~ \mathbf{A}&=[a_{ij}]_{p\times p} \\ \frac{\partial \mathbf{Ax}}{\partial \mathbf{x}}&=\frac{\partial }{\partial \mathbf{x}}\mathbf{Ax}\\ &=\frac{\partial }{\partial \mathbf{x}}\left [ \sum_{j=1}^{p}a_{1j}x_j,\sum_{j=1}^{p}a_{2j}x_j,\cdots,\sum_{j=1}^{p}a_{pj}x_j \right ]^T\\ &=\begin{bmatrix} \frac{\partial }{\partial \mathbf{x}}\sum_{j=1}^{p}a_{1j}x_j \\ \frac{\partial }{\partial \mathbf{x}}\sum_{j=1}^{p}a_{2j}x_j \\ \vdots \\ \frac{\partial }{\partial \mathbf{x}}\sum_{j=1}^{p}a_{pj}x_j \end{bmatrix}\\ &=\begin{bmatrix} \frac{\partial }{\partial x_1}\sum_{j=1}^{p}a_{1j}x_j & \frac{\partial }{\partial x_2}\sum_{j=1}^{p}a_{1j}x_j & \cdots & \frac{\partial }{\partial x_p}\sum_{j=1}^{p}a_{1j}x_j\\ \frac{\partial }{\partial x_1}\sum_{j=1}^{p}a_{2j}x_j & \frac{\partial }{\partial x_2}\sum_{j=1}^{p}a_{2j}x_j & \cdots & \frac{\partial }{\partial x_p}\sum_{j=1}^{p}a_{2j}x_j\\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial }{\partial x_1}\sum_{j=1}^{p}a_{pj}x_j & \frac{\partial }{\partial x_2}\sum_{j=1}^{p}a_{pj}x_j & \cdots & \frac{\partial }{\partial x_p}\sum_{j=1}^{p}a_{pj}x_j \end{bmatrix}\\ &=\mathbf{A} \end{align*}$$

定理(4):\((i)\frac{\partial \mathbf{x}^T\mathbf{A}\mathbf{x}}{\partial \mathbf{x}}=(\mathbf{A}+\mathbf{A}^T)\mathbf{x} \quad (ii)If~ \mathbf{A}=\mathbf{A}^T\Rightarrow \frac{\partial \mathbf{x}^T\mathbf{A}\mathbf{x}}{\partial \mathbf{x}}=2\mathbf{A}\mathbf{x}\Leftrightarrow \frac{\partial }{\partial x}ax^2=2x\)

\(Proof:\) $$ \begin{align*} Let~ \mathbf{A}&=[a_{ij}]_{p\times p} \\f(\mathbf{x})&=\mathbf{x}^T\mathbf{A}\mathbf{x}=\sum_{i=1}^{p}\sum_{j=1}^{p}x_ia_{ij}x_j \\ \frac{\partial f}{\partial x_i}&=\frac{\partial}{\partial x_i}\left [ \sum_{i=1}^{p}\sum_{j=1}^{p}x_ia_{ij}x_j \right ] \\&=\sum_{j=1}^{p}a_{ij}x_j+\sum_{j=1}^{p}x_ja_{ji} \\&=\sum_{j=1}^{p}a_{ij}x_j+\sum_{j=1}^{p}a_{ji}x_j \\\Rightarrow &\frac{\partial f}{\partial \mathbf{x}} \\ &=\mathbf{A}\mathbf{x}+\mathbf{A}^T\mathbf{x}\\ &=(\mathbf{A}+\mathbf{A}^T)\mathbf{x} \end{align*}$$ 乘法律:
若 \(h(x) = f(x)g(x)\),\(f(x)\) 和 \(g(x)\) 在 \(x\) 都是可微
\(h'(x) = f'(x)g(x) + f (x)g'(x)\)

定義:scalar by matrix 的導數

假設 \(f\) 為 function,且擁有 \(m \times n\) matrix \(\mathbf{X}\) 變數 $$ \mathbf{X_{m\times n}}= \begin{bmatrix} x_{11} & x_{12} & \cdots & x_{1n}\\ x_{21} & x_{22} & \cdots & x_{2n}\\ \vdots & \vdots & \ddots & \vdots\\ x_{m1} & x_{m2} & \cdots & x_{mn}\\ \end{bmatrix} $$
假設所有 \(\frac{\partial f}{\partial x_{ij}}\) 皆存在
$$ \frac{\partial f}{\partial \mathbf{X}}= \begin{bmatrix} \frac{\partial f}{\partial x_{11}} & \frac{\partial f}{\partial x_{12}} & \cdots & \frac{\partial f}{\partial x_{1n}}\\ \frac{\partial f}{\partial x_{21}} & \frac{\partial f}{\partial x_{22}} & \cdots & \frac{\partial f}{\partial x_{2n}}\\ \vdots & \vdots & \ddots & \vdots\\ \frac{\partial f}{\partial x_{m1}} & \frac{\partial f}{\partial x_{m2}} & \cdots & \frac{\partial f}{\partial x_{mn}}\\ \end{bmatrix}_{m\times n} $$

定義:matrix by scalar 的導數

假設 \(\mathbf{F}\) 為 matrix function,且擁有 \(x\) 變數
且所有 \(\frac{\partial f_{ij}}{\partial x}\) 皆存在
$$ \frac{\partial \mathbf{F}}{\partial x}= \begin{bmatrix} \frac{\partial f_{11}}{\partial x} & \frac{\partial f_{12}}{\partial x} & \cdots & \frac{\partial f_{1n}}{\partial x}\\ \frac{\partial f_{21}}{\partial x} & \frac{\partial f_{22}}{\partial x} & \cdots & \frac{\partial f_{2n}}{\partial x}\\ \vdots & \vdots & \ddots & \vdots\\ \frac{\partial f_{m1}}{\partial x} & \frac{\partial f_{m2}}{\partial x} & \cdots & \frac{\partial f_{mn}}{\partial x}\\ \end{bmatrix}_{m\times n} $$

定理(5):\(\mathbf{X}_{n \times p} \Rightarrow \frac{\partial tr(\mathbf{X}^T\mathbf{X})}{\partial \mathbf{X}}=2\mathbf{X}\)

\(Proof:\) $$ \begin{align*} \mathbf{X}^T\mathbf{X}&= \begin{bmatrix} x_{11} & x_{12} & \cdots & x_{1p} \\ x_{21} & x_{22} & \cdots & x_{2p} \\ \vdots & \vdots & \ddots & \vdots \\ x_{n1} & x_{n2} & \cdots & x_{np} \end{bmatrix}^T \begin{bmatrix} x_{11} & x_{12} & \cdots & x_{1p}\\ x_{21} & x_{22} & \cdots & x_{2p}\\ \vdots & \vdots & \ddots & \vdots \\ x_{n1} & x_{n2} & \cdots & x_{np} \end{bmatrix} \\&= \begin{bmatrix} \sum_{i=1}^{n}x_{i1}x_{i1} & \sum_{i=1}^{n}x_{i1}x_{i2} & \cdots & \sum_{i=1}^{n}x_{i1}x_{ip}\\ \sum_{i=1}^{n}x_{i2}x_{i1} & \sum_{i=1}^{n}x_{i2}x_{i2} & \cdots & \sum_{i=1}^{n}x_{i2}x_{ip}\\ \vdots & \vdots & \ddots & \vdots \\ \sum_{i=1}^{n}x_{ip}x_{i1} & \sum_{i=1}^{n}x_{ip}x_{i2} & \cdots & \sum_{i=1}^{n}x_{ip}x_{ip} \end{bmatrix}\\ \\tr(\mathbf{X}^T\mathbf{X})&=\left ( \sum_{i=1}^{n}x_{i1}x_{i1}+ \sum_{i=1}^{n}x_{i2}x_{i2}+\cdots+ \sum_{i=1}^{n}x_{ip}x_{ip} \right ) \\&=\sum_{i=1}^{n}\sum_{j=1}^{p}x_{ij}^2\\ \\\frac{\partial tr(\mathbf{X}^T\mathbf{X})}{\partial x_{ij}} &=\frac{\partial \sum_{i=1}^{n}\sum_{j=1}^{p}x_{ij}^2}{\partial x_{ij}} \\&=2x_{ij} \\\Rightarrow \frac{\partial tr(\mathbf{X}^T\mathbf{X})}{\partial \mathbf{X}}&=2\mathbf{X} \end{align*} $$

定理(6):\(\mathbf{A}_{n \times p},\mathbf{X}_{n \times p} \Rightarrow \frac{\partial tr(\mathbf{A}^T\mathbf{X})} {\partial \mathbf{X}} = \mathbf{A}\)

\(Proof:\) $$ \begin{align*} \mathbf{A}^T\mathbf{X}&= \begin{bmatrix} a_{11} & a_{21} & \cdots & a_{n1} \\ a_{12} & a_{22} & \cdots & a_{n1} \\ \vdots & \vdots & \ddots & \vdots \\ a_{1p} & a_{2p} & \cdots & a_{np} \end{bmatrix} \begin{bmatrix} x_{11} & x_{12} & \cdots & x_{1p} \\ x_{21} & x_{22} & \cdots & x_{2p} \\ \vdots & \vdots & \ddots & \vdots \\ x_{n1} & x_{n2} & \cdots & x_{np} \end{bmatrix}\\ &=\begin{bmatrix} \sum_{i=1}^{n}a_{i1}x_{i1} & \sum_{i=1}^{n}a_{i1}x_{i2} & \cdots & \sum_{i=1}^{n}a_{i1}x_{ip}\\ \sum_{i=1}^{n}a_{i2}x_{i1} & \sum_{i=1}^{n}a_{i2}x_{i2} & \cdots & \sum_{i=1}^{n}a_{i2}x_{ip}\\ \vdots & \vdots & \ddots & \vdots \\ \sum_{i=1}^{n}a_{ip}x_{i1} & \sum_{i=1}^{n}a_{ip}x_{i2} & \cdots & \sum_{i=1}^{n}a_{ip}x_{ip} \end{bmatrix} \\\\ &tr(\mathbf{A}^T\mathbf{X})=\sum_{i=1}^{n}\sum_{j=1}^{p}a_{ij}x_{ij}\\ &\frac{\partial tr(\mathbf{A}^T\mathbf{X})}{\partial x_{ij}}=a_{ij} \Rightarrow \frac{\partial tr(\mathbf{A}^T\mathbf{X})}{\partial \mathbf{X}}=\mathbf{A} \end{align*}$$

定理(7):\(\mathbf{A}_{n \times n},\mathbf{X}_{n \times p},\mathbf{B}_{p \times n} \Rightarrow \frac{\partial tr(\mathbf{A}^T\mathbf{X}\mathbf{B})} {\partial \mathbf{X}} = \mathbf{AB}^T\)

\(Proof:\) $$ \begin{align*} \mathbf{A}^T\mathbf{XB}&= \begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{n1} & a_{n2} & \cdots & a_{nn} \end{bmatrix}^T \begin{bmatrix} x_{11} & x_{12} & \cdots & x_{1p} \\ x_{21} & x_{22} & \cdots & x_{2p} \\ \vdots & \vdots & \ddots & \vdots \\ x_{n1} & x_{n2} & \cdots & x_{np} \end{bmatrix} \begin{bmatrix} b_{11} & b_{12} & \cdots & b_{1n} \\ b_{21} & b_{22} & \cdots & b_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ b_{p1} & b_{p2} & \cdots & b_{pn} \end{bmatrix}\\ &=\begin{bmatrix} a_{11} & a_{21} & \cdots & a_{n1} \\ a_{12} & a_{22} & \cdots & a_{n2} \\ \vdots & \vdots & \ddots & \vdots \\ a_{1n} & a_{2n} & \cdots & a_{nn} \end{bmatrix} \begin{bmatrix} x_{11} & x_{12} & \cdots & x_{1p} \\ x_{21} & x_{22} & \cdots & x_{2p} \\ \vdots & \vdots & \ddots & \vdots \\ x_{n1} & x_{n2} & \cdots & x_{np} \end{bmatrix} \begin{bmatrix} b_{11} & b_{12} & \cdots & b_{1n} \\ b_{21} & b_{22} & \cdots & b_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ b_{p1} & b_{p2} & \cdots & b_{pn} \end{bmatrix}\\ &=\begin{bmatrix} \sum_{i=1}^{n}a_{i1}x_{i1} & \sum_{i=1}^{n}a_{i1}x_{i2} & \cdots & \sum_{i=1}^{n}a_{i1}x_{ip}\\ \sum_{i=1}^{n}a_{i2}x_{i1} & \sum_{i=1}^{n}a_{i2}x_{i2} & \cdots & \sum_{i=1}^{n}a_{i2}x_{ip}\\ \vdots & \vdots & \ddots & \vdots \\ \sum_{i=1}^{n}a_{in}x_{i1} & \sum_{i=1}^{n}a_{in}x_{i2} & \cdots & \sum_{i=1}^{n}a_{in}x_{ip} \end{bmatrix} \begin{bmatrix} b_{11} & b_{12} & \cdots & b_{1n} \\ b_{21} & b_{22} & \cdots & b_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ b_{p1} & b_{p2} & \cdots & b_{pn} \end{bmatrix}\\ &=\begin{bmatrix} \sum_{j=1}^{p}\sum_{i=1}^{n}a_{i1}x_{ij}b_{j1} & \sum_{j=1}^{p}\sum_{i=1}^{n}a_{i1}x_{ij}b_{j2} & \cdots & \sum_{j=1}^{p}\sum_{i=1}^{n}a_{i1}x_{ij}b_{jn}\\ \sum_{j=1}^{p}\sum_{i=1}^{n}a_{i2}x_{ij}b_{j1} & \sum_{j=1}^{p}\sum_{i=1}^{n}a_{i2}x_{ij}b_{j2} & \cdots & \sum_{j=1}^{p}\sum_{i=1}^{n}a_{i2}x_{ij}b_{jn}\\ \vdots & \vdots & \ddots & \vdots \\ \sum_{j=1}^{p}\sum_{i=1}^{n}a_{in}x_{ij}b_{j1} & \sum_{j=1}^{p}\sum_{i=1}^{n}a_{in}x_{ij}b_{j2} & \cdots & \sum_{j=1}^{p}\sum_{i=1}^{n}a_{in}x_{ij}b_{jn} \end{bmatrix}\\\\ &tr(\mathbf{AXB})=\sum_{k=1}^{n}\sum_{j=1}^{p}\sum_{i=1}^{n}a_{ik}x_{ij}b_{jk}\\ &\frac{\partial tr(\mathbf{AXB})}{\partial x_{ij}} =\sum_{k=1}^{n}a_{ik}b_{jk} \Rightarrow \frac{\partial tr(\mathbf{AXB})}{\partial \mathbf{X}} =\mathbf{A}\mathbf{B}^T\\\\ &\mathbf{A}\mathbf{B}^T =\begin{bmatrix} \sum_{i=1}^{n}a_{1i}b_{1i} & \sum_{i=1}^{n}a_{1i}b_{2i} & \cdots & \sum_{i=1}^{n}a_{1i}b_{pi}\\ \sum_{i=1}^{n}a_{2i}b_{1i} & \sum_{i=1}^{n}a_{2i}b_{2i} & \cdots & \sum_{i=1}^{n}a_{2i}b_{pi}\\ \vdots & \vdots & \ddots & \vdots \\ \sum_{i=1}^{n}a_{ni}b_{1i} & \sum_{i=1}^{n}a_{ni}b_{2i} & \cdots & \sum_{i=1}^{n}a_{ni}b_{pi} \end{bmatrix} \end{align*}$$

定理(8):\(\mathbf{A}_{n \times n},\mathbf{X}_{n \times p} \Rightarrow \frac{\partial tr(\mathbf{X}^T\mathbf{A}\mathbf{X})} {\partial \mathbf{X}} = (\mathbf{A}+\mathbf{A}^T)\mathbf{X}\)

\(Proof:\) $$ \begin{align*} \mathbf{X}^T\mathbf{AX}&= \begin{bmatrix} x_{11} & x_{12} & \cdots & x_{1p}\\ x_{21} & x_{22} & \cdots & x_{2p}\\ \vdots & \vdots & \ddots & \vdots \\ x_{n1} & x_{n2} & \cdots & x_{np} \end{bmatrix}^T \begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n}\\ a_{21} & a_{22} & \cdots & a_{2n}\\ \vdots & \vdots & \ddots & \vdots \\ a_{n1} & a_{n2} & \cdots & a_{nn} \end{bmatrix} \begin{bmatrix} x_{11} & x_{12} & \cdots & x_{1p}\\ x_{21} & x_{22} & \cdots & x_{2p}\\ \vdots & \vdots & \ddots & \vdots \\ x_{n1} & x_{n2} & \cdots & x_{np} \end{bmatrix}\\ &=\begin{bmatrix} x_{11} & x_{21} & \cdots & x_{n1}\\ x_{12} & x_{22} & \cdots & x_{n2}\\ \vdots & \vdots & \ddots & \vdots \\ x_{1p} & x_{2p} & \cdots & x_{np} \end{bmatrix} \begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n}\\ a_{21} & a_{22} & \cdots & a_{2n}\\ \vdots & \vdots & \ddots & \vdots \\ a_{n1} & a_{n2} & \cdots & a_{nn} \end{bmatrix} \begin{bmatrix} x_{11} & x_{12} & \cdots & x_{1p}\\ x_{21} & x_{22} & \cdots & x_{2p}\\ \vdots & \vdots & \ddots & \vdots \\ x_{n1} & x_{n2} & \cdots & x_{np} \end{bmatrix}\\ &=\begin{bmatrix} \sum_{i=1}^{n}x_{i1}a_{i1} & \sum_{i=1}^{n}x_{i1}a_{i2} & \cdots & \sum_{i=1}^{n}x_{i1}a_{in}\\ \sum_{i=1}^{n}x_{i2}a_{i1} & \sum_{i=1}^{n}x_{i2}a_{i2} & \cdots & \sum_{i=1}^{n}x_{i2}a_{ip}\\ \vdots & \vdots & \ddots & \vdots \\ \sum_{i=1}^{n}x_{ip}a_{i1} & \sum_{i=1}^{n}x_{ip}a_{i2} & \cdots & \sum_{i=1}^{n}x_{ip}a_{in} \end{bmatrix} \begin{bmatrix} x_{11} & x_{12} & \cdots & x_{1p}\\ x_{21} & x_{22} & \cdots & x_{2p}\\ \vdots & \vdots & \ddots & \vdots \\ x_{n1} & x_{n2} & \cdots & x_{np} \end{bmatrix}\\ &=\begin{bmatrix} \sum_{j=1}^{n}\sum_{i=1}^{n}x_{i1}a_{ij}x_{j1} & \sum_{j=1}^{n}\sum_{i=1}^{n}x_{i1}a_{ij}x_{j2} & \cdots & \sum_{j=1}^{n}\sum_{i=1}^{n}x_{i1}a_{ij}x_{jp} \\ \sum_{j=1}^{n}\sum_{i=1}^{n}x_{i2}a_{ij}x_{j1} & \sum_{j=1}^{n}\sum_{i=1}^{n}x_{i2}a_{ij}x_{j2} & \cdots & \sum_{j=1}^{n}\sum_{i=1}^{n}x_{i2}a_{ij}x_{jp} \\ \vdots & \vdots & \ddots & \vdots \\ \sum_{j=1}^{n}\sum_{i=1}^{n}x_{ip}a_{ij}x_{j1} & \sum_{j=1}^{n}\sum_{i=1}^{n}x_{ip}a_{ij}x_{j2} & \cdots & \sum_{j=1}^{n}\sum_{i=1}^{n}x_{ip}a_{ij}x_{jp} \end{bmatrix}\\\\ \end{align*} $$ $$ \begin{align*} tr(\mathbf{X}^T\mathbf{A}\mathbf{X}) &=\sum_{k=1}^{p}\sum_{j=1}^{n}\sum_{i=1}^{n}x_{ip}a_{ij}x_{jp} \\ \frac{\partial tr(\mathbf{X}^T\mathbf{A}\mathbf{X})}{\partial x_{ip}}&=\sum_{j=1}^{n}a_{ij}x_{jp} +\sum_{j=1}^{n}x_{jp}a_{ji}\\ &=\sum_{j=1}^{n}a_{ij}x_{jp} +\sum_{j=1}^{n}a_{ji}x_{jp}\\ \Rightarrow \frac{\partial tr(\mathbf{X}^T\mathbf{A}\mathbf{X})}{\partial \mathbf{X}}&=(\mathbf{A}+\mathbf{A}^T)\mathbf{X} \end{align*} $$乘法律:
若 \(h(x) = f(x)g(x)\),\(f(x)\) 和 \(g(x)\) 在 \(x\) 都是可微
\(h'(x) = f'(x)g(x) + f (x)g'(x)\)

定理(9):\(\mathbf{A}_{p \times p},\mathbf{X}_{n \times p} \Rightarrow \frac{\partial tr(\mathbf{X}\mathbf{A}\mathbf{X}^T)} {\partial \mathbf{X}} = \mathbf{X}(\mathbf{A}+\mathbf{A}^T)\)

\(Proof:\) $$ \begin{align*} \mathbf{XA}\mathbf{X}^T&= \begin{bmatrix} x_{11} & x_{12} & \cdots & x_{1p}\\ x_{21} & x_{22} & \cdots & x_{2p}\\ \vdots & \vdots & \ddots & \vdots \\ x_{n1} & x_{n2} & \cdots & x_{np} \end{bmatrix} \begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1p}\\ a_{21} & a_{22} & \cdots & a_{2p}\\ \vdots & \vdots & \ddots & \vdots \\ a_{p1} & a_{p2} & \cdots & a_{pp} \end{bmatrix} \begin{bmatrix} x_{11} & x_{12} & \cdots & x_{1p}\\ x_{21} & x_{22} & \cdots & x_{2p}\\ \vdots & \vdots & \ddots & \vdots \\ x_{n1} & x_{n2} & \cdots & x_{np} \end{bmatrix}^T\\ &=\begin{bmatrix} x_{11} & x_{12} & \cdots & x_{1p}\\ x_{21} & x_{22} & \cdots & x_{2p}\\ \vdots & \vdots & \ddots & \vdots \\ x_{n1} & x_{n2} & \cdots & x_{np} \end{bmatrix} \begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1p}\\ a_{21} & a_{22} & \cdots & a_{2p}\\ \vdots & \vdots & \ddots & \vdots \\ a_{p1} & a_{p2} & \cdots & a_{pp} \end{bmatrix} \begin{bmatrix} x_{11} & x_{21} & \cdots & x_{n1}\\ x_{12} & x_{22} & \cdots & x_{n2}\\ \vdots & \vdots & \ddots & \vdots \\ x_{1p} & x_{2p} & \cdots & x_{np} \end{bmatrix}\\ &=\begin{bmatrix} \sum_{i=1}^{p}x_{1i}a_{i1} & \sum_{i=1}^{p}x_{1i}a_{i2} & \cdots & \sum_{i=1}^{p}x_{1i}a_{ip}\\ \sum_{i=1}^{p}x_{2i}a_{i1} & \sum_{i=1}^{p}x_{2i}a_{i2} & \cdots & \sum_{i=1}^{p}x_{2i}a_{ip}\\ \vdots & \vdots & \ddots & \vdots \\ \sum_{i=1}^{p}x_{ni}a_{i1} & \sum_{i=1}^{p}x_{ni}a_{i2} & \cdots & \sum_{i=1}^{p}x_{ni}a_{ip} \end{bmatrix} \begin{bmatrix} x_{11} & x_{21} & \cdots & x_{n1}\\ x_{12} & x_{22} & \cdots & x_{n2}\\ \vdots & \vdots & \ddots & \vdots \\ x_{1p} & x_{2p} & \cdots & x_{np} \end{bmatrix}\\ &=\begin{bmatrix} \sum_{j=1}^{p}\sum_{i=1}^{p}x_{1i}a_{ij}x_{1j} & \sum_{j=1}^{p}\sum_{i=1}^{p}x_{1i}a_{ij}x_{2j} & \cdots & \sum_{j=1}^{p}\sum_{i=1}^{p}x_{1i}a_{ij}x_{pj} \\ \sum_{j=1}^{p}\sum_{i=1}^{p}x_{2i}a_{ij}x_{1j} & \sum_{j=1}^{p}\sum_{i=1}^{p}x_{2i}a_{ij}x_{2j} & \cdots & \sum_{j=1}^{p}\sum_{i=1}^{p}x_{2i}a_{ij}x_{pj} \\ \vdots & \vdots & \ddots & \vdots \\ \sum_{j=1}^{p}\sum_{i=1}^{p}x_{pi}a_{ij}x_{1j} & \sum_{j=1}^{p}\sum_{i=1}^{p}x_{pi}a_{ij}x_{2j} & \cdots & \sum_{j=1}^{p}\sum_{i=1}^{p}x_{ni}a_{ij}x_{nj} \end{bmatrix}\\\\ \end{align*} $$ $$ \begin{align*} tr(\mathbf{X}\mathbf{A}\mathbf{X}^T) &=\sum_{k=1}^{n}\sum_{j=1}^{p}\sum_{i=1}^{p}x_{ki}a_{ij}x_{kj} \\ \frac{\partial tr(\mathbf{X}\mathbf{A}\mathbf{X}^T)}{\partial x_{ki}}&=\sum_{j=1}^{p}a_{ij}x_{kj} +\sum_{j=1}^{p}x_{kj}a_{ji}\\ &=\sum_{j=1}^{p}x_{kj}a_{ji}+\sum_{j=1}^{p}x_{kj}a_{ij}\\ \Rightarrow \frac{\partial tr(\mathbf{X}^T\mathbf{A}\mathbf{X})}{\partial \mathbf{X}} &=\mathbf{X}(\mathbf{A}+\mathbf{A}^T) \end{align*} $$ 乘法律:
若 \(h(x) = f(x)g(x)\),\(f(x)\) 和 \(g(x)\) 在 \(x\) 都是可微
\(h'(x) = f'(x)g(x) + f (x)g'(x)\) $$

定理(10):\(\mathbf{X}_{p \times p}\Rightarrow \frac{\partial |\mathbf{X}|}{\partial \mathbf{X}}=|\mathbf{X}|(\mathbf{X}^{-1})^T\)

\(Proof:\) $$ \begin{align*} |\mathbf{X}|=\det \mathbf{X}&=\sum_{j=1}^nx_{ij}(-1)^{i+j}\det\mathbf{X}_{ij}\\ &=\sum_{j=1}^nx_{ij}c_{ij}\\\\ \frac{\partial \det \mathbf{X}}{\partial x_{ij}}&=\frac{\partial\sum_{k}x_{ik}c_{ik}}{\partial x_{ij}}\\ &=c_{ij}\\ &=\left((\hbox{adj}X)^T\right)_{ij}\\ &=\left((\det X)(X^{-1})^T\right)_{ij}\\ \Rightarrow \frac{\partial |\mathbf{X}|}{\partial \mathbf{X}}&=|\mathbf{X}|(\mathbf{X}^{-1})^T\\\\ \end{align*} $$
----------------------------------------------------------------------------------------------------------------------------------------
$$ \det \mathbf{X}=\sum_{j=1}^nx_{ij}c_{ij}=\sum_{j=1}^n(-1)^{i+j}x_{ij}\det\tilde{\mathbf{X}}_{ij}\\ \begin{bmatrix} x_{11}&x_{12}&\cdots&x_{1n}\\ x_{21}&x_{22}&\cdots&x_{2n}\\ \vdots&\vdots&\ddots&\vdots\\ x_{n1}&x_{n2}&\cdots&x_{nn} \end{bmatrix} \begin{bmatrix} c_{11}&c_{21}&\cdots&c_{n1}\\ c_{12}&c_{22}&\cdots&c_{n2}\\ \vdots&\vdots&\ddots&\vdots\\ c_{1n}&c_{2n}&\cdots&c_{nn} \end{bmatrix}= \begin{bmatrix} \det \mathbf{X}&0&\cdots&0\\ 0&\det \mathbf{X}&\cdots&0\\ \vdots&\vdots&\ddots&\vdots\\ 0&0&\cdots&\det \mathbf{X} \end{bmatrix}\\ \mathbf{X}\mathbf{C}^T=(\det \mathbf{X})\mathbf{I} $$ \(\mathbf{C}^T\) 為 \(\mathbf{X}\) 的伴隨矩陣 (adjugate 或 classical adjoint),記作 \(\mathrm{adj}\mathbf{X}\)
若 \(\mathbf{X}\) 可逆,則 \(\mathbf{C}^T=(\det \mathbf{X})\mathbf{X}^{-1}\)

為何只有對角線為 \(\det\mathbf{X}\),其餘為 0?
如果 \(i\neq j\),那麼 \(\mathbf{X}\mathbf{C}^T\) 的第 \(i\) 行第 \(j\) 列的係數是 \(\sum_{k=1}^{n} x_{ik} c_{jk}\)
拉普拉斯公式說明這個和等於 0
(等同把 \(\mathbf{X}\) 的第 \(j\) 行元素換成第 \(i\) 行元素後求行列式。由於有兩行相同,行列式為 0)。
以 \(\mathbf{X}\) 的第 2 列和 \(\mathbf{C}^T\) 的第 1 行相乘為例:
\(x_{21}\) 和 \(c_{11}\) 思考如圖,左為目前的做法,右為正常的做法
$$ \sum_{k=1}^{n} x_{2k} c_{1k} \Rightarrow \det\begin{bmatrix} x_{21}&x_{22}&\cdots&x_{2n}\\ x_{21}&x_{22}&\cdots&x_{2n}\\ \vdots&\vdots&\ddots&\vdots\\ x_{n1}&x_{n2}&\cdots&x_{nn} \end{bmatrix}=0 $$

定理(11):\(\mathbf{X}_{p \times p}\Rightarrow \frac{\partial \ln|\mathbf{X}|}{\partial \mathbf{X}}=(\mathbf{X}^{-1})^T\)

\(Proof:\) 根據 定理(10) $$ \begin{align*} \frac{\partial \ln\mathbf{|X|}}{\partial x} &=\frac{1}{\mathbf{|X|}}\frac{\partial }{\partial \mathbf{X}}\mathbf{|X|}\\ &=\frac{1}{\mathbf{|X|}}\mathbf{|X|}(\mathbf{X}^{-1})^T\\ &=(\mathbf{X}^{-1})^T \end{align*} $$ Chain Rule 連鎖律:
設 \(y = f (u)\) 可以對 \(u\) 微分,而函數 \(u = g(x)\) 是可以對 \(x\) 微分
則 \(y = f (g(x))\) 是可以對 \(x\) 微分的
同時 $$\frac{\mathrm{d} y}{\mathrm{d} x}=\frac{\mathrm{d} y}{\mathrm{d} u}\frac{\mathrm{d} u}{\mathrm{d} x}$$

留言