自分でニューラルネットワークを作ろうhttps://qiita.com/takahiro_itazuri/items/d2bea1c643d7cca11352#comment-a59cd26161ee56ea1220
の、
python
1 # 重みの更新 2 self.w_ho += self.lr * np.dot((e_o * self.daf(o_o)), o_h.T) 3 self.w_ih += self.lr * np.dot((e_h * self.daf(o_h)), o_i.T)
この式なんですが、なんでこんな式になるんでしょうか??
コードを見ると、「隠れ層」と「出力層」のみのようなのですが、
誤差に導関数をかけて、隠れ層をかける????
数式で書くと、どういう事になってるんでしょうか、数式で書いて説明して頂ければ、幸いです、微分してこんな形になるんですかね???
重みの式は以下らしいです(以下ページ参照
https://qiita.com/perrying/items/6b782a21e0b105ea875cより。
以下、全て変数でおいてエクセルで計算しました、しかし重みの式になるのか・・・?
e_hをまだ偏微分していないので分かりません、なるんでしょうか?
また計算どこか誤ってますかね?
省略されていますが、e_h(e_oでなく)の式は(すいません、e_hの式間違ってました、画像では訂正してませんが、以下に訂正します)、
e_h
wh[0,0]{t[0]-f(wh[0,0]f(wi[0,0]i[0]+wi[0,1]i[1]+wi[0,2]i[2]+wi[0,3]i[3])+wh[0,1]f(wi[0,0]i[0]+wi[0,1]i[1]+wi[0,2]i[2]+wi[0,3]i[3])+wh[0,2]f(wi[2,0]i[0]+wi[2,1]i[1]+wi[2,2]i[2]+wi[2,3]i[3])+wh[0,3]f(wi[3,0]i[0]+wi[3,1]i[1]+wi[3,2]i[2]+wi[3,3]i[3]))}+wh[1,0]{t[1]-f(wh[1,0]f(wi[0,0]i[0]+wi[0,1]i[1]+wi[0,2]i[2]+wi[0,3]i[3])+wh[1,1]f(wi[0,0]i[0]+wi[0,1]i[1]+wi[0,2]i[2]+wi[0,3]i[3])+wh[1,2]f(wi[2,0]i[0]+wi[2,1]i[1]+wi[2,2]i[2]+wi[2,3]i[3])+wh[1,3]f(wi[3,0]i[0]+wi[3,1]i[1]+wi[3,2]i[2]+wi[3,3]i[3]))}+wh[2,0]{t[2]-f(wh[2,0]f(wi[0,0]i[0]+wi[0,1]i[1]+wi[0,2]i[2]+wi[0,3]i[3])+wh[2,1]f(wi[0,0]i[0]+wi[0,1]i[1]+wi[0,2]i[2]+wi[0,3]i[3])+wh[2,2]f(wi[2,0]i[0]+wi[2,1]i[1]+wi[2,2]i[2]+wi[2,3]i[3])+wh[2,3]f(wi[3,0]i[0]+wi[3,1]i[1]+wi[3,2]i[2]+wi[3,3]i[3]))}+wh[3,0]{t[3]-f(wh[3,0]f(wi[0,0]i[0]+wi[0,1]i[1]+wi[0,2]i[2]+wi[0,3]i[3])+wh[3,1]f(wi[0,0]i[0]+wi[0,1]i[1]+wi[0,2]i[2]+wi[0,3]i[3])+wh[3,2]f(wi[2,0]i[0]+wi[2,1]i[1]+wi[2,2]i[2]+wi[2,3]i[3])+wh[3,3]f(wi[3,0]i[0]+wi[3,1]i[1]+wi[3,2]i[2]+wi[3,3]i[3]))}
wh[0,1]{t[0]-f(wh[0,0]f(wi[0,0]i[0]+wi[0,1]i[1]+wi[0,2]i[2]+wi[0,3]i[3])+wh[0,1]f(wi[0,0]i[0]+wi[0,1]i[1]+wi[0,2]i[2]+wi[0,3]i[3])+wh[0,2]f(wi[2,0]i[0]+wi[2,1]i[1]+wi[2,2]i[2]+wi[2,3]i[3])+wh[0,3]f(wi[3,0]i[0]+wi[3,1]i[1]+wi[3,2]i[2]+wi[3,3]i[3]))}+wh[1,1]{t[1]-f(wh[1,0]f(wi[0,0]i[0]+wi[0,1]i[1]+wi[0,2]i[2]+wi[0,3]i[3])+wh[1,1]f(wi[0,0]i[0]+wi[0,1]i[1]+wi[0,2]i[2]+wi[0,3]i[3])+wh[1,2]f(wi[2,0]i[0]+wi[2,1]i[1]+wi[2,2]i[2]+wi[2,3]i[3])+wh[1,3]f(wi[3,0]i[0]+wi[3,1]i[1]+wi[3,2]i[2]+wi[3,3]i[3]))}+wh[2,1]{t[2]-f(wh[2,0]f(wi[0,0]i[0]+wi[0,1]i[1]+wi[0,2]i[2]+wi[0,3]i[3])+wh[2,1]f(wi[0,0]i[0]+wi[0,1]i[1]+wi[0,2]i[2]+wi[0,3]i[3])+wh[2,2]f(wi[2,0]i[0]+wi[2,1]i[1]+wi[2,2]i[2]+wi[2,3]i[3])+wh[2,3]f(wi[3,0]i[0]+wi[3,1]i[1]+wi[3,2]i[2]+wi[3,3]i[3]))}+wh[3,1]{t[3]-f(wh[3,0]f(wi[0,0]i[0]+wi[0,1]i[1]+wi[0,2]i[2]+wi[0,3]i[3])+wh[3,1]f(wi[0,0]i[0]+wi[0,1]i[1]+wi[0,2]i[2]+wi[0,3]i[3])+wh[3,2]f(wi[2,0]i[0]+wi[2,1]i[1]+wi[2,2]i[2]+wi[2,3]i[3])+wh[3,3]f(wi[3,0]i[0]+wi[3,1]i[1]+wi[3,2]i[2]+wi[3,3]i[3]))}
wh[0,2]{t[0]-f(wh[0,0]f(wi[0,0]i[0]+wi[0,1]i[1]+wi[0,2]i[2]+wi[0,3]i[3])+wh[0,1]f(wi[0,0]i[0]+wi[0,1]i[1]+wi[0,2]i[2]+wi[0,3]i[3])+wh[0,2]f(wi[2,0]i[0]+wi[2,1]i[1]+wi[2,2]i[2]+wi[2,3]i[3])+wh[0,3]f(wi[3,0]i[0]+wi[3,1]i[1]+wi[3,2]i[2]+wi[3,3]i[3]))}+wh[1,2]{t[1]-f(wh[1,0]f(wi[0,0]i[0]+wi[0,1]i[1]+wi[0,2]i[2]+wi[0,3]i[3])+wh[1,1]f(wi[0,0]i[0]+wi[0,1]i[1]+wi[0,2]i[2]+wi[0,3]i[3])+wh[1,2]f(wi[2,0]i[0]+wi[2,1]i[1]+wi[2,2]i[2]+wi[2,3]i[3])+wh[1,3]f(wi[3,0]i[0]+wi[3,1]i[1]+wi[3,2]i[2]+wi[3,3]i[3]))}+wh[2,2]{t[2]-f(wh[2,0]f(wi[0,0]i[0]+wi[0,1]i[1]+wi[0,2]i[2]+wi[0,3]i[3])+wh[2,1]f(wi[0,0]i[0]+wi[0,1]i[1]+wi[0,2]i[2]+wi[0,3]i[3])+wh[2,2]f(wi[2,0]i[0]+wi[2,1]i[1]+wi[2,2]i[2]+wi[2,3]i[3])+wh[2,3]f(wi[3,0]i[0]+wi[3,1]i[1]+wi[3,2]i[2]+wi[3,3]i[3]))}+wh[3,2]{t[3]-f(wh[3,0]f(wi[0,0]i[0]+wi[0,1]i[1]+wi[0,2]i[2]+wi[0,3]i[3])+wh[3,1]f(wi[0,0]i[0]+wi[0,1]i[1]+wi[0,2]i[2]+wi[0,3]i[3])+wh[3,2]f(wi[2,0]i[0]+wi[2,1]i[1]+wi[2,2]i[2]+wi[2,3]i[3])+wh[3,3]f(wi[3,0]i[0]+wi[3,1]i[1]+wi[3,2]i[2]+wi[3,3]i[3]))}
wh[0,3]{t[0]-f(wh[0,0]f(wi[0,0]i[0]+wi[0,1]i[1]+wi[0,2]i[2]+wi[0,3]i[3])+wh[0,1]f(wi[0,0]i[0]+wi[0,1]i[1]+wi[0,2]i[2]+wi[0,3]i[3])+wh[0,2]f(wi[2,0]i[0]+wi[2,1]i[1]+wi[2,2]i[2]+wi[2,3]i[3])+wh[0,3]f(wi[3,0]i[0]+wi[3,1]i[1]+wi[3,2]i[2]+wi[3,3]i[3]))}+wh[1,3]{t[1]-f(wh[1,0]f(wi[0,0]i[0]+wi[0,1]i[1]+wi[0,2]i[2]+wi[0,3]i[3])+wh[1,1]f(wi[0,0]i[0]+wi[0,1]i[1]+wi[0,2]i[2]+wi[0,3]i[3])+wh[1,2]f(wi[2,0]i[0]+wi[2,1]i[1]+wi[2,2]i[2]+wi[2,3]i[3])+wh[1,3]f(wi[3,0]i[0]+wi[3,1]i[1]+wi[3,2]i[2]+wi[3,3]i[3]))}+wh[2,3]{t[2]-f(wh[2,0]f(wi[0,0]i[0]+wi[0,1]i[1]+wi[0,2]i[2]+wi[0,3]i[3])+wh[2,1]f(wi[0,0]i[0]+wi[0,1]i[1]+wi[0,2]i[2]+wi[0,3]i[3])+wh[2,2]f(wi[2,0]i[0]+wi[2,1]i[1]+wi[2,2]i[2]+wi[2,3]i[3])+wh[2,3]f(wi[3,0]i[0]+wi[3,1]i[1]+wi[3,2]i[2]+wi[3,3]i[3]))}+wh[3,3]{t[3]-f(wh[3,0]f(wi[0,0]i[0]+wi[0,1]i[1]+wi[0,2]i[2]+wi[0,3]i[3])+wh[3,1]f(wi[0,0]i[0]+wi[0,1]i[1]+wi[0,2]i[2]+wi[0,3]i[3])+wh[3,2]f(wi[2,0]i[0]+wi[2,1]i[1]+wi[2,2]i[2]+wi[2,3]i[3])+wh[3,3]f(wi[3,0]i[0]+wi[3,1]i[1]+wi[3,2]i[2]+wi[3,3]i[3]))}
になりそうです、
これを偏微分するんでしょうか?一成分ずつ。
回答3件
あなたの回答
tips
プレビュー