矩阵运算
参考:
Wikipedia – Matrix calculus
Wikipedia 上对于矩阵的微分描述得很详细。
给定两个矩阵 A = ( a i j ) m × n A=\begin{pmatrix} a_{ij}\end{pmatrix}_{m \times n} A=(aij)m×n 和 B = ( b i j ) m × n B=\begin{pmatrix} b_{ij}\end{pmatrix}_{m \times n} B=(bij)m×n,它们的阿达马积和克罗内克积定义如下:
阿达马积(Hadamard product): A ∘ B = ( a i j ⋅ b i j ) m × n A \circ B=\begin{pmatrix} a_{ij} \cdot b_{ij} \end{pmatrix}_{m \times n} A∘B=(aij⋅bij)m×n,又称逐元素积(elementwise product)。
克罗内克积(Kronnecker product): A ⨂ B = ( a 11 B ⋯ a 1 n B ⋮ ⋱ ⋮ a m 1 B ⋯ a m n B ) A\bigotimes B=\begin{pmatrix}a_{11} B& \cdots & a_{1n}B \\ \vdots & \ddots & \vdots \\ a_{m1}B & \cdots & a_{mn}B \end{pmatrix} A⨂B=⎝⎜⎛a11B⋮am1B⋯⋱⋯a1nB⋮amnB⎠⎟⎞
矩阵的求导:
1. 矩阵 Y 对标量 x i x_i xi 求导:
相当于每个元素求倒数后转置一下,注: M × N M \times N M×N 矩阵求导后变 N × M N \times M N×M 矩阵
∂ Y ∂ x i = [ ∂ Y i j ∂ x i ] T \frac{\partial Y}{\partial x_{i}}=\begin{bmatrix} \frac{\partial Y_{ij}}{\partial x_{i}}\end{bmatrix}^T ∂xi∂Y=[∂xi∂Yij]T
2. 标量 y i y_i yi 对列向量 x x x 求导:
∂ y i ∂ x = [ ∂ y i ∂ x 1 ∂ y i ∂ x 2 ⋮ ] \frac{\partial y_i}{\partial x}=\begin{bmatrix} \frac{\partial y_{i}}{\partial x_{1}} \\ \frac{\partial y_{i}}{\partial x_{2}} \\ \vdots \end{bmatrix} ∂x∂yi=⎣⎢⎡∂x1∂yi∂x2∂yi⋮⎦⎥⎤
3. 行向量 y T y^T yT 对列向量 x x x 求导:
注: 1 × M 1 \times M 1×M 矩阵对 N × 1 N \times 1 N×1 矩阵求导后变 N × M N \times M N×M 矩阵
∂ y T ∂ x = [ y 1 y 2 ⋯ y n ] [ x 1 x 2 ⋮ x n ] = [ ∂ y 1 ∂ x ∂ y 2 ∂ x ⋮ ∂ y n ∂ x ] \frac{\partial y^T}{\partial x}=\frac {\begin{bmatrix} y_1&y_2&\cdots &y_n\end{bmatrix}}{\begin{bmatrix}x_1 \\ x_2 \\ \vdots \\x_n \end{bmatrix}}=\begin{bmatrix} \frac{\partial y_{1}}{\partial x} \\ \frac{\partial y_{2}}{\partial x} \\ \vdots \\ \frac {\partial y_{n}}{\partial x}\end{bmatrix} ∂x∂yT=⎣⎢⎡x1x2⋮xn⎦⎥⎤[y1y2⋯yn]=⎣⎢⎢⎢⎡∂x∂y1∂x∂y2⋮∂x∂yn⎦⎥⎥⎥⎤
有如下公式:
① ∂ x T x = I \frac{\partial x^T}{x}=I x∂xT=I; ② ∂ ( A x ) T ∂ x = A T \frac{\partial {(Ax)}^T}{\partial x}=A^T ∂x∂(Ax)T=AT
4. 列向量 y y y 对行向量 x T x^T xT 求导:
注: M × 1 M \times 1 M×1 矩阵对 1 × N 1 \times N 1×N 矩阵求导后变 M × N M \times N M×N 矩阵
∂ y ∂ x T = ( ∂ y T ∂ x ) T \frac{\partial y}{\partial x^T}=(\frac{\partial y^T}{\partial x})^T ∂xT∂y=(∂x∂yT)T
5. 向量积对列向量 x x x 求导:
∂ u v T ∂ x = ( ∂ u ∂ x ) v T + u ( ∂ v T ∂ x ) \frac {\partial uv^T}{\partial x}=(\frac{\partial u}{\partial x})v^T+u(\frac{\partial v^T}{\partial x}) ∂x∂uvT=(∂x∂u)vT+u(∂x∂vT)
∂ v u T ∂ x = ( ∂ u T ∂ x ) v + u T ( ∂ v T ∂ x ) \frac {\partial vu^T}{\partial x}=(\frac{\partial u^T}{\partial x})v+u^T(\frac{\partial v^T}{\partial x}) ∂x∂vuT=(∂x∂uT)v+uT(∂x∂vT)
① ∂ ( x T A ) ∂ x = ( ∂ x T ∂ x ) A + x T ( ∂ A ∂ x ) = I A + 0 x T = A \frac {\partial (x^TA)}{\partial x}=(\frac{\partial x^T}{\partial x})A+x^T(\frac{\partial A}{\partial x})=IA+0x^T=A ∂x∂(xTA)=(∂x∂xT)A+xT(∂x∂A)=IA+0xT=A;
② ∂ ( A x ) ∂ x T = [ ∂ ( x T A T ) ∂ x ] T = ( A T ) T = A \frac{\partial (Ax)}{\partial x^T}=[\frac{\partial (x^TA^T)}{\partial x}]^T=(A^T)^T=A ∂xT∂(Ax)=[∂x∂(xTAT)]T=(AT)T=A;
③ ∂ ( x T A x ) ∂ x = ( ∂ x T ∂ x ) A x + [ ∂ ( A x ) T ∂ x ] x = A x + A T x \frac{\partial (x^TAx)}{\partial x}=(\frac{\partial x^T}{\partial x})Ax+[\frac{\partial (Ax)^T}{\partial x}]x=Ax+A^Tx ∂x∂(xTAx)=(∂x∂xT)Ax+[∂x∂(Ax)T]x=Ax+ATx;
6. 矩阵 Y Y Y 对列向量 x x x 求导:
将 Y Y Y 对 x x x 的每个分量求偏导构成一个超向量(该向量每个元素都为一个矩阵)
[ ∂ y i j ] ∂ [ x 1 x 2 ⋮ x n ] = [ ∂ [ y i j ] ∂ x 1 ∂ [ y i j ] ∂ x 2 ⋮ ∂ [ y i j ] ∂ x n ] \frac{\begin{bmatrix} \partial y_{ij} \end{bmatrix}}{\partial \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix}}=\begin{bmatrix} \frac{\partial [y_{ij}]}{\partial x_1} \\ \frac{\partial [y_{ij}]}{\partial x_2}\\ \vdots \\ \frac{\partial [y_{ij}]}{\partial x_n}\end{bmatrix} ∂⎣⎢⎡x1x2⋮xn⎦⎥⎤[∂yij]=⎣⎢⎢⎢⎢⎡∂x1∂[yij]∂x2∂[yij]⋮∂xn∂[yij]⎦⎥⎥⎥⎥⎤
注: ∂ [ y i j ] ∂ x n \frac{\partial [y_{ij}]}{\partial x_n} ∂xn∂[yij] 为一个矩阵。
7. 矩阵积对列向量 x x x :
∂ ( u v ) ∂ x = ( ∂ u ∂ x ) v + u ( ∂ v ∂ x ) \frac{\partial(uv)}{\partial x}=(\frac{\partial u}{\partial x})v+u(\frac{\partial v}{\partial x}) ∂x∂(uv)=(∂x∂u)v+u(∂x∂v)
① ∂ ( x T A ) ∂ x = ( ∂ x T ∂ x ) A + x T ( ∂ A ∂ x ) = I A + x T 0 = A \frac{\partial(x^TA)}{\partial x}=(\frac{\partial x^T}{\partial x})A+x^T(\frac{\partial A}{\partial x})=IA+x^T0=A ∂x∂(xTA)=(∂x∂xT)A+xT(∂x∂A)=IA+xT0=A
8. 标量 y i y_i yi 对矩阵 X X X 的导数:
把 y i y_i yi 对 X X X 每个元素求导
∂ y i ∂ X = ∂ y i ∂ [ x i j ] \frac{\partial y_i}{\partial X}=\frac{\partial y_i}{\partial [x_{ij}]} ∂X∂yi=∂[xij]∂yi
① y i = u T X T v = ∑ ∑ u ( i ) x ( i j ) v ( j ) ⇒ ∂ y i ∂ X = u v T y_i=u^TX^Tv=\sum\sum u(i)x(ij)v(j) \Rightarrow \frac{\partial y_i}{\partial X}=uv^T yi=uTXTv=∑∑u(i)x(ij)v(j)⇒∂X∂yi=uvT;
y i = u T X T X u y_i=u^TX^TXu yi=uTXTXu 则 ∂ y i ∂ X = 2 X u u T \frac{\partial y_i}{\partial X}=2Xuu^T ∂X∂yi=2XuuT;
② y i = ( X u − v ) T ( X u − v ) y_i=(Xu-v)^T(Xu-v) yi=(Xu−v)T(Xu−v) 则 ∂ y i ∂ X = ∂ ( u T X T X u − 2 v T X u + v T v ) ∂ X = 2 X u u T − 2 v u T + 0 = 2 ( X u − v ) u T \frac{\partial y_i}{\partial X}=\frac{\partial (u^TX^TXu-2v^TXu+v^Tv)}{\partial X}=2Xuu^T-2vu^T+0=2(Xu-v)u^T ∂X∂yi=∂X∂(uTXTXu−2vTXu+vTv)=2XuuT−2vuT+0=2(Xu−v)uT
9. 矩阵 Y Y Y 对矩阵 X X X 求导:
将 Y Y Y 的每个元素对 X X X 求导,构成一个超级矩阵。
包括机器学习、神经网络、深度学习、强化学习各种方面的文章