self.aaaaaaaaaaaaaaaaaaはデバック用に置いた変数です
ですがこの値が矛盾してます 理由が全く分かりません
import numpy as np x=np.random.rand(100)#乱数生成 y=0 for _ in range(100): y+=x[_] print(sum(x)) print(y) の場合で y==xにならない 下のコードではということになってます
self.aaaaaaaaaaaaaaaaaaはlistでlossの中身を見るためにつけました
問題のコード説明
強化学習をやっていて
self.tortal_lossesはdoneが来るまでのlossの合計
lossを1stepずつ足していき done信号(定期的に来る)がきたら0にする
lossは二乗しているので必ず+です
self.aaaaaaaaaaaaaaaaaaの合計は明らかに0.0001を超えているのに
表示されます
エピソード数(doneが何回来たか) Episode finished この問題には関係ない値 loss tensor(self.tortal_losses)
はdone信号が来たら表示します
問題の部分
python
1 2 self.tortal_losses+=loss.detach().to('cpu') 3 4 self.aaaaaaaaaaaaaaaaaa.append(loss.detach()) 5 if self.tortal_losses<0.0001: 6 print(self.aaaaaaaaaaaaaaaaaa) 7リセット部分 8 def Done(self,step): 9 self.beta = self.beta_initial + (1 - self.beta_initial) * step / self.beta_steps 10 self.Rs=[0 for _ in range(multireward_steps)] 11 self.tortal_losses=0 12 self.aaaaaaaaaaaaaaaaaa=[]#-----------------------------
13 Episode finished -1158.8909563610828 loss tensor(0.0026) [tensor(1.0641e-05, device='cuda:0')] [tensor(1.0641e-05, device='cuda:0'), tensor(6.0145e-06, device='cuda:0')] [tensor(1.0641e-05, device='cuda:0'), tensor(6.0145e-06, device='cuda:0'), tensor(1.8771e-05, device='cuda:0')] [tensor(1.0641e-05, device='cuda:0'), tensor(6.0145e-06, device='cuda:0'), tensor(1.8771e-05, device='cuda:0'), tensor(1.0291e-05, device='cuda:0')] [tensor(1.0641e-05, device='cuda:0'), tensor(6.0145e-06, device='cuda:0'), tensor(1.8771e-05, device='cuda:0'), tensor(1.0291e-05, device='cuda:0'), tensor(8.9044e-06, device='cuda:0')] [tensor(1.0641e-05, device='cuda:0'), tensor(6.0145e-06, device='cuda:0'), tensor(1.8771e-05, device='cuda:0'), tensor(1.0291e-05, device='cuda:0'), tensor(8.9044e-06, device='cuda:0'), tensor(8.3876e-06, device='cuda:0')] [tensor(1.0641e-05, device='cuda:0'), tensor(6.0145e-06, device='cuda:0'), tensor(1.8771e-05, device='cuda:0'), tensor(1.0291e-05, device='cuda:0'), tensor(8.9044e-06, device='cuda:0'), tensor(8.3876e-06, device='cuda:0'), tensor(1.9963e-05, device='cuda:0')] [tensor(1.0641e-05, device='cuda:0'), tensor(6.0145e-06, device='cuda:0'), tensor(1.8771e-05, device='cuda:0'), tensor(1.0291e-05, device='cuda:0'), tensor(8.9044e-06, device='cuda:0'), tensor(8.3876e-06, device='cuda:0'), tensor(1.9963e-05, device='cuda:0'), tensor(7.8826e-06, device='cuda:0')] [tensor(1.0641e-05, device='cuda:0'), tensor(6.0145e-06, device='cuda:0'), tensor(1.8771e-05, device='cuda:0'), tensor(1.0291e-05, device='cuda:0'), tensor(8.9044e-06, device='cuda:0'), tensor(8.3876e-06, device='cuda:0'), tensor(1.9963e-05, device='cuda:0'), tensor(7.8826e-06, device='cuda:0'), tensor(9.0109e-06, device='cuda:0')] 14 Episode finished -1128.1795158087486 loss tensor(0.0038) [tensor(2.9019e-05, device='cuda:0')] [tensor(2.9019e-05, device='cuda:0'), tensor(2.4793e-05, device='cuda:0')] [tensor(2.9019e-05, device='cuda:0'), tensor(2.4793e-05, device='cuda:0'), tensor(1.5492e-05, device='cuda:0')] [tensor(2.9019e-05, device='cuda:0'), tensor(2.4793e-05, device='cuda:0'), tensor(1.5492e-05, device='cuda:0'), tensor(1.5380e-05, device='cuda:0')] [tensor(2.9019e-05, device='cuda:0'), tensor(2.4793e-05, device='cuda:0'), tensor(1.5492e-05, device='cuda:0'), tensor(1.5380e-05, device='cuda:0'), tensor(1.0752e-05, device='cuda:0')] 15 Episode finished -1135.8225297289405 loss tensor(0.0252) 16 Episode finished -1153.358029432231 loss tensor(0.0104) [tensor(1.7044e-05, device='cuda:0')] [tensor(1.7044e-05, device='cuda:0'), tensor(1.5978e-05, device='cuda:0')] [tensor(1.7044e-05, device='cuda:0'), tensor(1.5978e-05, device='cuda:0'), tensor(2.3430e-05, device='cuda:0')]
回答1件
あなたの回答
tips
プレビュー