Softmax公式及梯度计算

softmax是一个多分类器,可以计算预测对象属于各个类别的概率。

公式

y i = S ( z ) i = e z i <munderover> j = 1 C </munderover> e z j i = 1 , . . . , C y_i=S(\boldsymbol{z})_i = \frac{e^{z_i}}{\sum_{j=1}^{C}e^{z_j}},i=1,...,C yi=S(z)i=j=1Cezjezii=1,...,C

  • z \boldsymbol{z} z是上一层的输出,softmax的输入, 维度为 C C C
  • y i y_i yi为预测对象属于第 c c c类的概率

梯度

变量间的计算图如上,已知 y \boldsymbol{y} y的梯度 l y i , i = 1 , . . . , C \frac{\partial l}{\partial y_i}, i=1,...,C yil,i=1,...,C,要计算 z \boldsymbol{z} z的梯度 l z j , j = 1 , . . . , C \frac{\partial l}{\partial z_j}, j=1,...,C zjl,j=1,...,C

从计算图中可以看到, z \boldsymbol{z} z的分量 z j z_j zj y \boldsymbol{y} y的每一个分量都有贡献,因此:
l z j = <munderover> i = 1 C </munderover> l y i y i z j \frac{\partial l}{\partial z_j} = \sum_{i=1}^{C}\frac{\partial l}{\partial y_i} \frac{\partial y_i}{\partial z_j} zjl=i=1Cyilzjyi

由于 l y i \frac{\partial l}{\partial y_i} yil已知,因此计算 y i z j \frac{\partial y_i}{\partial z_j} zjyi即可!

为方便记 j = 1 C e z j \sum_{j=1}^{C}e^{z_j} j=1Cezj C \sum_C C

(1) i = j i=j i=j时:
<mstyle displaystyle="true" scriptlevel="0"> y i z j </mstyle> <mstyle displaystyle="true" scriptlevel="0"> = e z i <munder> C </munder> e z i e z i <munder> C </munder> 2 </mstyle> <mstyle displaystyle="true" scriptlevel="0"> </mstyle> <mstyle displaystyle="true" scriptlevel="0"> = e z i <munder> C </munder> e z i <munder> C </munder> 2 </mstyle> <mstyle displaystyle="true" scriptlevel="0"> </mstyle> <mstyle displaystyle="true" scriptlevel="0"> = y i y i 2 </mstyle> <mstyle displaystyle="true" scriptlevel="0"> </mstyle> <mstyle displaystyle="true" scriptlevel="0"> = y i ( 1 y i ) </mstyle> \begin{aligned} \frac{\partial y_i}{\partial z_j} &amp; = \frac{e^{z_i}\sum_C-e^{z_i}e^{z_i}}{{\sum_C}^2} \\ &amp;=\frac{e^{z_i}}{\sum_C} - \frac{e_{z_i}}{\sum_C}^2 \\ &amp; = y_i-y_i^2 \\ &amp; = y_i(1-y_i) \end{aligned} zjyi=C2eziCeziezi=CeziCezi2=yiyi2=yi(1yi)
(2) i j i \neq j i̸=j
<mstyle displaystyle="true" scriptlevel="0"> y i z j </mstyle> <mstyle displaystyle="true" scriptlevel="0"> = 0 <munder> C </munder> e z i e z j <munder> C </munder> 2 </mstyle> <mstyle displaystyle="true" scriptlevel="0"> </mstyle> <mstyle displaystyle="true" scriptlevel="0"> = e z i <munder> C </munder> e z j <munder> C </munder> </mstyle> <mstyle displaystyle="true" scriptlevel="0"> </mstyle> <mstyle displaystyle="true" scriptlevel="0"> = y i y j </mstyle> \begin{aligned} \frac{\partial y_i}{\partial z_j} &amp;= \frac{0\sum_C - e^{z_i}e^{z_j}}{{\sum_C}^2} \\ &amp;= -\frac{e_{z_i}}{\sum_C}\frac{e_{z_j}}{\sum_C} \\ &amp;=-y_iy_j \end{aligned} zjyi=C20Ceziezj=CeziCezj=yiyj

全部评论

相关推荐

菜鸡29号:根据已有信息能初步得出以下几点: 1、硕士排了大本和大专 2、要求会多语言要么是招人很挑剔要么就是干的活杂 3、给出校招薪资范围过于巨大,说明里面的薪资制度(包括涨薪)可能有大坑
点赞 评论 收藏
分享
评论
点赞
收藏
分享

创作者周榜

更多
牛客网
牛客企业服务