常連

以下のプログラムは途中までのものですが、google colabだと動いたのですが、会社のGPUを積んだマシンだとエラーがでました。エラーが出た原因が知りたいです。
動かしたプログラム
#####
```ここに言語を入力
import gym
from creversi.gym_reversi.envs import ReversiVecEnv
from creversi import *

import os
import datetime
import math
import random
import numpy as np
from collections import namedtuple
from itertools import count
from tqdm import tqdm_notebook as tqdm
import matplotlib.pyplot as plt

import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F

BATCH_SIZE = 256

vecenv = ReversiVecEnv(BATCH_SIZE)

# if gpu is to be used
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

######################################################################
# Replay Memory

Transition = namedtuple('Transition',
                        ('state', 'action', 'next_state', 'next_actions', 'reward'))


class ReplayMemory(object):

    def __init__(self, capacity):
        self.capacity = capacity
        self.memory = []
        self.position = 0

    def push(self, *args):
        """Saves a transition."""
        if len(self.memory) < self.capacity:
            self.memory.append(None)
        self.memory[self.position] = Transition(*args)
        self.position = (self.position + 1) % self.capacity

    def sample(self, batch_size):
        return random.sample(self.memory, batch_size)

    def __len__(self):
        return len(self.memory)

######################################################################
# DQN

k = 192
fcl_units = 256
class DQN(nn.Module):

    def __init__(self):
        super(DQN, self).__init__()
        self.conv1 = nn.Conv2d(2, k, kernel_size=3, padding=1)
        self.bn1 = nn.BatchNorm2d(k)
        self.conv2 = nn.Conv2d(k, k, kernel_size=3, padding=1)
        self.bn2 = nn.BatchNorm2d(k)
        self.conv3 = nn.Conv2d(k, k, kernel_size=3, padding=1)
        self.bn3 = nn.BatchNorm2d(k)
        self.conv4 = nn.Conv2d(k, k, kernel_size=3, padding=1)
        self.bn4 = nn.BatchNorm2d(k)
        self.conv5 = nn.Conv2d(k, k, kernel_size=3, padding=1)
        self.bn5 = nn.BatchNorm2d(k)
        self.conv6 = nn.Conv2d(k, k, kernel_size=3, padding=1)
        self.bn6 = nn.BatchNorm2d(k)
        self.conv7 = nn.Conv2d(k, k, kernel_size=3, padding=1)
        self.bn7 = nn.BatchNorm2d(k)
        self.conv8 = nn.Conv2d(k, k, kernel_size=3, padding=1)
        self.bn8 = nn.BatchNorm2d(k)
        self.conv9 = nn.Conv2d(k, k, kernel_size=3, padding=1)
        self.bn9 = nn.BatchNorm2d(k)
        self.conv10 = nn.Conv2d(k, k, kernel_size=3, padding=1)
        self.bn10 = nn.BatchNorm2d(k)
        self.fcl1 = nn.Linear(k * 64, fcl_units)
        self.fcl2 = nn.Linear(fcl_units, 65)

    def forward(self, x):
        x = F.relu(self.bn1(self.conv1(x)))
        x = F.relu(self.bn2(self.conv2(x)))
        x = F.relu(self.bn3(self.conv3(x)))
        x = F.relu(self.bn4(self.conv4(x)))
        x = F.relu(self.bn5(self.conv5(x)))
        x = F.relu(self.bn6(self.conv6(x)))
        x = F.relu(self.bn7(self.conv7(x)))
        x = F.relu(self.bn8(self.conv8(x)))
        x = F.relu(self.bn9(self.conv9(x)))
        x = F.relu(self.bn10(self.conv10(x)))
        x = F.relu(self.fcl1(x.view(-1, k * 64)))
        x = self.fcl2(x)
        return x.tanh()

def get_states(envs):
    features_vec = np.zeros((BATCH_SIZE, 2, 8, 8), dtype=np.float32)
    for i, env in enumerate(envs):
        env.board.piece_planes(features_vec[i])
    return torch.from_numpy(features_vec).to(device)
```
この最後のreturn torch.from_numpy(features_vec).to(device)のところで、、google colabでは
エラーが出ないのですが、会社のGPUを積んだLinuxだと
RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
というエラーがでます。ググったりして調べたのですが、原因が分かりませんでした。
会社のGPUの環境は、下記の通りです。
-bash-4.2$ lspci | grep -i nvidia
3b:00.0 VGA compatible controller: NVIDIA Corporation TU102 [GeForce RTX 2080 Ti] (rev a1)
3b:00.1 Audio device: NVIDIA Corporation TU102 High Definition Audio Controller (rev a1)
3b:00.2 USB controller: NVIDIA Corporation TU102 USB 3.1 Host Controller (rev a1)
3b:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU102 USB Type-C UCSI Controller (rev a1)
af:00.0 VGA compatible controller: NVIDIA Corporation GP102 [GeForce GTX 1080 Ti] (rev a1)
af:00.1 Audio device: NVIDIA Corporation GP102 HDMI Audio Controller (rev a1)

ご教示いただけますと幸いです。
何卒、よろしくお願い申し上げます。

強化学習でRuntimeError: CUDA error: out of memoryが出る

### 前提・実現したいこと

unityでの強化学習を自分が作ったエージェントや環境で行いたく、コードを作成したがエラーが出て進めなくなってしまった。


### 発生している問題・エラーメッセージ

```
Assets\CarAgent.cs(37,26): error CS0115: 'CarAgent.CollectObservations()': no suitable method found to override
Assets\CarAgent.cs(46,26): error CS0115: 'CarAgent.OnActionReceived(float[], string)': no suitable method found to override
Assets\CarAgent.cs(28,26): error CS0115: 'CarAgent.AgentReset()': no suitable method found to override
```

### 該当のソースコード

```ここに言語名を入力
using System.Collections;
using System.Collections.Generic;
using UnityEngine;
using Unity.MLAgents;
using Unity.MLAgents.Sensors;
using Unity.MLAgents.Actuators;

public class CarAgent : Agent
{
    
    private RayPerceptionSensor rayPer;
    private Rigidbody rigidbody;

    private Vector3 initPosition;
    private Quaternion initRotation;
    private bool crush;

    public override void OnEpisodeBegin()
    {

        this.rayPer = GetComponent<RayPerceptionSensor>();
        this.rigidbody = GetComponent<Rigidbody>();

        this.initPosition = this.transform.position;
        this.initRotation = this.transform.rotation;
    }

    public override void AgentReset()
    {
        this.transform.position = this.initPosition;
        this.transform.rotation = this.initRotation;
        rigidbody.velocity = new Vector3(0, 0, 0);
        rigidbody.anglarVelocity = new Vector3(0, 0, 0);
        this.crush = false;
    }

    public override void CollectObservations()
    {
        float rayDistance = 50.0f;
        float[] rayAngles = { 0f, 45f, 90f, 135f, 180f, 110f, 70f };
        string[] detectableObjects;
        detectableObjects = new string[] { "car", "wall" };
        addvectoerObs(rayPer.Perceive(rayDistance, rayAngles, detectableObjects, 1f, 0f));
    }

    public override void OnActionReceived(float[]vectorAction,string textAction)
    {
        float handle = Mathf.Clamp(vectorAction[0], -1.0f, 1.0f) * 1.5f;

        this.gameObject.transform.Rotate(0, handle, 0);
        this.rigidbody.velocity = this.gameObject.transform.rotation * new Vector3(0, 0, 20);

        AddReward(0.001f);

        if (this.crush) Done();
    }

    void OnCollisionEnter(Collision collision)
    {
        this.crush = true;
    }

}

```

### 試したこと

Agentの基底クラスにOnCollisionEnterがそんざいしないということなのか。
なにかほかの部分でのコードが間違っているのか。

### 補足情報（FW/ツールのバージョンなど）

ここにより詳細な情報を記載してください。

no suitable method found to overrideの解決（unity　ml-agents）

### 前提・実現したいこと
UnityのMl-agentsを用い強化学習及び、模倣学習をしたい

### 発生している問題・エラーメッセージ
Ml-agentsの最新バージョンである 0.11.0を使用しているが参考文献が少なすぎて解決できない。

```
(base) C:\Users\user\Documents\ml-agents-master\ml-agents>mlagents-learn ../config/trainer_config.yaml --run-id=firstRun --train
2019-11-14 14:45:59.431992: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cudart64_100.dll'; dlerror: cudart64_100.dll not found
2019-11-14 14:45:59.436532: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
WARNING:tensorflow:
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.



                        ▄▄▄▓▓▓▓
                   ╓▓▓▓▓▓▓█▓▓▓▓▓
              ,▄▄▄m▀▀▀'  ,▓▓▓▀▓▓▄                           ▓▓▓  ▓▓▌
            ▄▓▓▓▀'      ▄▓▓▀  ▓▓▓      ▄▄     ▄▄ ,▄▄ ▄▄▄▄   ,▄▄ ▄▓▓▌▄ ▄▄▄    ,▄▄
          ▄▓▓▓▀        ▄▓▓▀   ▐▓▓▌     ▓▓▌   ▐▓▓ ▐▓▓▓▀▀▀▓▓▌ ▓▓▓ ▀▓▓▌▀ ^▓▓▌  ╒▓▓▌
        ▄▓▓▓▓▓▄▄▄▄▄▄▄▄▓▓▓      ▓▀      ▓▓▌   ▐▓▓ ▐▓▓    ▓▓▓ ▓▓▓  ▓▓▌   ▐▓▓▄ ▓▓▌
        ▀▓▓▓▓▀▀▀▀▀▀▀▀▀▀▓▓▄     ▓▓      ▓▓▌   ▐▓▓ ▐▓▓    ▓▓▓ ▓▓▓  ▓▓▌    ▐▓▓▐▓▓
          ^█▓▓▓        ▀▓▓▄   ▐▓▓▌     ▓▓▓▓▄▓▓▓▓ ▐▓▓    ▓▓▓ ▓▓▓  ▓▓▓▄    ▓▓▓▓`
            '▀▓▓▓▄      ^▓▓▓  ▓▓▓       └▀▀▀▀ ▀▀ ^▀▀    `▀▀ `▀▀   '▀▀    ▐▓▓▌
               ▀▀▀▀▓▄▄▄   ▓▓▓▓▓▓,                                      ▓▓▓▓▀
                   `▀█▓▓▓▓▓▓▓▓▓▌
                        ¬`▀▀▀█▓


INFO:mlagents.trainers:CommandLineOptions(debug=False, num_runs=1, seed=-1, env_path=None, run_id='firstRun', load_model=False, train_model=True, save_freq=50000, keep_checkpoints=5, base_port=5005, num_envs=1, curriculum_folder=None, lesson=0, slow=False, no_graphics=False, multi_gpu=False, trainer_config_path='../config/trainer_config.yaml', sampler_file_path=None, docker_target_name=None, env_args=None, cpu=False)
INFO:mlagents.envs:Start training by pressing the Play button in the Unity Editor.
```

### 該当のソースコード
上記のメッセージが出たのちUnity上でplayを押すと以下のメッセージが出る。
```
Process Process-1:
Traceback (most recent call last):
  File "c:\users\user\anaconda3\lib\multiprocessing\process.py", line 297, in _bootstrap
    self.run()
  File "c:\users\user\anaconda3\lib\multiprocessing\process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "c:\users\user\anaconda3\lib\site-packages\mlagents\envs\subprocess_env_manager.py", line 82, in worker
    env = env_factory(worker_id)
  File "c:\users\user\anaconda3\lib\site-packages\mlagents\trainers\learn.py", line 359, in create_unity_environment
    args=env_args,
  File "c:\users\user\anaconda3\lib\site-packages\mlagents\envs\environment.py", line 105, in __init__
    aca_output = self.send_academy_parameters(rl_init_parameters_in)
  File "c:\users\user\anaconda3\lib\site-packages\mlagents\envs\environment.py", line 689, in send_academy_parameters
    return self.communicator.initialize(inputs)
  File "c:\users\user\anaconda3\lib\site-packages\mlagents\envs\rpc_communicator.py", line 88, in initialize
    "The Unity environment took too long to respond. Make sure that :\n"
mlagents.envs.exception.UnityTimeOutException: The Unity environment took too long to respond. Make sure that :
         The environment does not need user interaction to launch
         The Agents are linked to the appropriate Brains
         The environment and the Python interface have compatible versions.
Traceback (most recent call last):
  File "c:\users\user\anaconda3\lib\multiprocessing\connection.py", line 312, in _recv_bytes
    nread, err = ov.GetOverlappedResult(True)
BrokenPipeError: [WinError 109] パイプは終了しました。

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\users\user\anaconda3\lib\site-packages\mlagents\envs\subprocess_env_manager.py", line 59, in recv
    response: EnvironmentResponse = self.conn.recv()
  File "c:\users\user\anaconda3\lib\multiprocessing\connection.py", line 250, in recv
    buf = self._recv_bytes()
  File "c:\users\user\anaconda3\lib\multiprocessing\connection.py", line 321, in _recv_bytes
    raise EOFError
EOFError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\users\user\anaconda3\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "c:\users\user\anaconda3\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Users\user\Anaconda3\Scripts\mlagents-learn.exe\__main__.py", line 9, in <module>
  File "c:\users\user\anaconda3\lib\site-packages\mlagents\trainers\learn.py", line 408, in main
    run_training(0, run_seed, options, Queue())
  File "c:\users\user\anaconda3\lib\site-packages\mlagents\trainers\learn.py", line 222, in run_training
    options.sampler_file_path, env.reset_parameters, run_seed
  File "c:\users\user\anaconda3\lib\site-packages\mlagents\envs\subprocess_env_manager.py", line 225, in reset_parameters
    return self.env_workers[0].recv().payload
  File "c:\users\user\anaconda3\lib\site-packages\mlagents\envs\subprocess_env_manager.py", line 62, in recv
    raise UnityCommunicationException("UnityEnvironment worker: recv failed.")
mlagents.envs.exception.UnityCommunicationException: UnityEnvironment worker: recv failed.
```

### 試したこと
再インストール、webサイトの閲覧

### 補足情報（FW/ツールのバージョンなど）

TensorFlow 1.15

Ml-Agentsのエラーについて

## 強化学習に関する質問

方策ベースのアルゴリズムを実装するとき、例えばエージェントの行動数が5次元だとすると、一つの方策(確率)だけが1になりあとの方策(確率)は全て０になることはあるのでしょうか。
## 状況

使用している言語:python
フレームワーク:pytorch
方策はpytorchのF.softmaxで出力させています


強化学習の方策の出力結果が1か0だけになる

### 実現したいこと
ポケモンのalphazero(モンテカルロ、deeplearning、強化学習)を作っています。ループでなぜか一回多く行われてしまいます。エラーを解決したいです。

### 発生している問題・分からないこと
ループでなぜか一回多く行われてしまいます。エラーを解決したいです。そのためエラーが発生しています

### エラーメッセージ
```error
WARNING:tensorflow:No training configuration found in the save file, so the model was *not* compiled. Compile it manually.
1/1 [==============================] - 0s 110ms/step
battle実行されました
0
1
モンテカルロ 6
c1 (Jolteon(271), BodySlam, [Jolteon(271)])
サンダース が技のしかかりをつかった!
こうかはいまひとつ... 0.5
サイドン が20をうけた
❤️ サイドン 残りHP 331
サイドン技いわなだれを使った!
急所に当たりました！
こうかはばつぐんだ！ 1.5
サンダースが271を受けた
⭐️ サンダース 残りHP 0
len(self.child_nodes) 1
argmax 0
turn 1
len(pucb_values) 1
pucb_values [array([0., 0., 0., 0.], dtype=float32)]
index <class 'numpy.int64'>
index 0
len(self.child_nodes) 1
self.child_nodes [<__main__.pv_mcts_scores.<locals>.Node object at 0x17a4326d0>]
len(self.child_nodes) 1
argmax 2
turn 2
len(pucb_values) 1
pucb_values [array([1. , 1. , 1.5, 1. ], dtype=float32)]
index <class 'numpy.int64'>
index 2
len(self.child_nodes) 1
self.child_nodes [<__main__.pv_mcts_scores.<locals>.Node object at 0x17a4326d0>]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[91], line 44
     42         c2hp=player1[0].actual_hp
     43     result=((c1,-1,-1,c1hp),(c2,-1,-1,c2hp))
---> 44     next_action=action1(result)
     45 winner = battle.get_winner()
     46 #ゲーム終了時

Cell In[89], line 117, in pv_mcts_action.<locals>.pv_mcts_action(state)
    116 def pv_mcts_action(state):
--> 117     scores = pv_mcts_scores(model, state, temperature,winner)
    118     rng=np.random.default_rng()
    119     return rng.choice([0,1,2,3], p=scores)

Cell In[89], line 102, in pv_mcts_scores(model, state, temperature, winner)
    100 # 複数回の評価の実行
    101 for _ in range(PV_EVALUATE_COUNT):
--> 102     root_node.evaluate()
    104 # 合法手の確率分布
    105 scores = nodes_to_scores(root_node.child_nodes)

Cell In[89], line 64, in pv_mcts_scores.<locals>.Node.evaluate(self)
     59     return value
     61 # 子ノードが存在する時
     62 else:
     63     # アーク評価値が最大の子ノードの評価で価値を取得
---> 64     value = self.next_child_node().evaluate()
     66     # 累計価値と試行回数の更新
     67     self.w += value

Cell In[89], line 95, in pv_mcts_scores.<locals>.Node.next_child_node(self)
     93 print("len(self.child_nodes)",len(self.child_nodes))
     94 print("self.child_nodes",self.child_nodes)
---> 95 return self.child_nodes[a]

IndexError: list index out of range
```

### 該当のソースコード

```python
from dual_network import DN_INPUT_SHAPE
from math import sqrt
from tensorflow.keras.models import load_model
from pathlib import Path
import numpy as np
import battle
from battle import Battle
import pokedex as p
import moves as m

# パラメータの準備
PV_EVALUATE_COUNT = 50 # 1推論あたりのシミュレーション回数（本家は1600）

# 推論
def predict(model, state):
    # 推論のための入力データのシェイプの変換
    x=np.array(state)
    x=x.reshape(1,4,2)

    # 推論
    y=model.predict(x,batch_size=1)

    # 方策の取得
    policies=y[0][0:4]
    
    # 価値の取得
    value=y[1][0]

    return policies, value    

# ノードのリストを試行回数のリストに変換
def nodes_to_scores(nodes):
    scores = []
    for c in nodes:
        scores.append(c.n)
    return scores

# モンテカルロ木探索のスコアの取得
#def pv_mcts_scores(model, p1_is,p1_mae_action,p1_took_damage,p1_nokorihp,p1_is,p2_mae_action,p2_took_damage,p2_nokorihp, temperature): #stateに8つの状態
def pv_mcts_scores(model, state, temperature,winner=None): #stateに8つの状態
# モンテカルロ木探索のノードの定義
    class Node:
        player1=[
            p.Jolteon([m.BodySlam(),m.DoubleKick(),m.PinMissle(),m.Thunderbolt()])
                ]

        player2=[
            p.Rhydon([m.Earthquake(), m.RockSlide(), m.Surf(), m.BodySlam()])
                ]
        
        # ノードの初期化
        def __init__(self, state, p,winner):
            self.state = state # 状態
            self.p = p # 方策
            self.w = 0 # 累計価値
            self.n = 0 # 試行回数
            self.winner=winner
            self.child_nodes = None  # 子ノード群
            (self.p1_is,self.p1_mae_action,self.p1_took_damage,self.p1_nokorihp),(self.p1_is,self.p2_mae_action,self.p2_took_damage,self.p2_nokorihp)=state
            self.turn=0
            
        # 局面の価値の計算
        def evaluate(self): #Battle が入る
            # ゲーム終了時
            if self.winner is not None:
                # 勝敗結果で価値を取得
                #print("hplen",len(self.p1_nokorihp))
                battle=Battle(player1,player2)
                value = 0 if self.winner == player1 else -1

                # 累計価値と試行回数の更新
                self.w += value
                self.n += 1
                return value

            # 子ノードが存在しない時
            if not self.child_nodes:
                # ニューラルネットワークの推論で方策と価値を取得
                policies, value = predict(model, state)

                # 累計価値と試行回数の更新
                self.w += value
                self.n += 1

                
                # 子ノードの展開
                self.child_nodes = []
                a=[6,7,8,9]
                for action, policy in zip(a, policies):
                    battle=Battle(player1,player2)
                    zyoutai=battle.forward_step(self.p1_nokorihp,self.p2_nokorihp,action)
                    winner = battle.get_winner()
                    self.child_nodes.append(Node(zyoutai, policy,winner))


                return value

            # 子ノードが存在する時
            else:
                # アーク評価値が最大の子ノードの評価で価値を取得
                value = self.next_child_node().evaluate()

                # 累計価値と試行回数の更新
                self.w += value
                self.n += 1
                return value

        # アーク評価値が最大の子ノードを取得
        def next_child_node(self):
            # アーク評価値の計算
            C_PUCT = 1.0
            t = sum(nodes_to_scores(self.child_nodes))
            pucb_values = []
            #print("前 child_nodes",len(self.child_nodes))
            for child_node in self.child_nodes:
                pucb_values.append((-child_node.w / child_node.n if child_node.n else 0.0) +
                    C_PUCT * child_node.p * sqrt(t) / (1 + child_node.n))
                self.turn+=1

            # アーク評価値が最大の子ノードを返す
            print("argmax",np.argmax(pucb_values))
            print("turn",self.turn)
            print("len(pucb_values)",len(pucb_values))
            print("pucb_values",pucb_values)
            index=np.argmax(pucb_values)
            a = index.item()
            print("index",type(index))
            print("index",index)
            print("len(self.child_nodes)",len(self.child_nodes))
            print("self.child_nodes",self.child_nodes)
            return self.child_nodes[a]

    # 現在の局面のノードの作成
    root_node = Node(state, 0,winner)

    # 複数回の評価の実行
    for _ in range(PV_EVALUATE_COUNT):
        root_node.evaluate()

    # 合法手の確率分布
    scores = nodes_to_scores(root_node.child_nodes)
    if temperature == 0: # 最大値のみ1
        action = np.argmax(scores)
        scores = np.zeros(len(scores))
        scores[action] = 1
    else: # ボルツマン分布でバラつき付加
        scores = boltzman(scores, temperature)
    return scores

# モンテカルロ木探索で行動選択
def pv_mcts_action(model, temperature=0):
    def pv_mcts_action(state):
        scores = pv_mcts_scores(model, state, temperature,winner)
        rng=np.random.default_rng()
        return rng.choice([0,1,2,3], p=scores)
    return pv_mcts_action

# ボルツマン分布
def boltzman(xs, temperature):
    xs = [x ** (1 / temperature) for x in xs]
    return [x / sum(xs) for x in xs]
```

```python
import moves as m
import pokedex as p
from damage import calculate_damage

# 動作確認
if __name__ == '__main__':
    # モデルの読み込み
    path = sorted(Path('./model').glob('*.h5'))[-1]
    model = load_model(str(path))
    winner=None
    # 状態の生成
    player1=[
        p.Jolteon([m.BodySlam(),m.DoubleKick(),m.PinMissle(),m.Thunderbolt()])
        ]

    player2=[
        p.Rhydon([m.Earthquake(), m.RockSlide(), m.Surf(), m.BodySlam()])
        ]

    battle=Battle(player1,player2)

    # モンテカルロ木探索で行動取得を行う関数の生成
    action1 = pv_mcts_action(model, 1.0)

    result=None
    while True:
        if result is not None:
            if winner is not None:
                print("バトルは終了しました")
                break
            else:
                result=battle.forward_step(action=next_action)
                next_action=action1(result)
        else:
            #１番目(resultない)
            #result= battle.forward_step()
            if player1[0].spe > player2[0].spe:
                c1=1
                c2=0
                c1hp=player1[0].actual_hp
                c2hp=player2[0].actual_hp
            else:
                c1=0
                c2=1
                c1hp=player2[0].actual_hp
                c2hp=player1[0].actual_hp
            result=((c1,-1,-1,c1hp),(c2,-1,-1,c2hp))
            next_action=action1(result)
        winner = battle.get_winner()
        #ゲーム終了時
        if winner is not None or battle.turn > 500:
            break
```

### 試したこと・調べたこと
- [ ] teratailやGoogle等で検索した
- [x] ソースコードを自分なりに変更した
- [ ] 知人に聞いた
- [ ] その他

##### 上記の詳細・結果
ループでなぜか2回実行されていることがわかりました。

### 補足
stateはシェイプ(4,2)

for文が一回多く行われる

### 前提・実現したいこと
teratailで初めて質問させていただきます。

Unityではじめての自作ゲーム(3Dballに似た何か)を作り、ML-Agentsで学習させようとして、いつものようにanaconda powershell promptにmlagents-learn config/ppo/3B.yaml --run-id=3B1223jと入力したところ、unityのロゴまでは出たのですが、再生ボタンを押すと、以下のようなエラーメッセージが表示されました。

**__特に最後の行で、Please add an entry in the configuration file for 3BBrain, or set default_settings.
とあるのは具体的に何をすればよいのでしょうか。__**


Traceback (most recent call last):
  File "C:\Users\daisuke\anaconda3\envs\mlagents\Scripts\mlagents-learn-script.py", line 33, in <module>
    sys.exit(load_entry_point('mlagents', 'console_scripts', 'mlagents-learn')())
  File "c:\users\daisuke\desktop\ml-agents\ml-agents\mlagents\trainers\learn.py", line 250, in main
    run_cli(parse_command_line())
  File "c:\users\daisuke\desktop\ml-agents\ml-agents\mlagents\trainers\learn.py", line 246, in run_cli
    run_training(run_seed, options)
  File "c:\users\daisuke\desktop\ml-agents\ml-agents\mlagents\trainers\learn.py", line 125, in run_training
    tc.start_learning(env_manager)
  File "c:\users\daisuke\desktop\ml-agents\ml-agents-envs\mlagents_envs\timers.py", line 305, in wrapped
    return func(*args, **kwargs)
  File "c:\users\daisuke\desktop\ml-agents\ml-agents\mlagents\trainers\trainer_controller.py", line 173, in start_learning
    self._reset_env(env_manager)
  File "c:\users\daisuke\desktop\ml-agents\ml-agents-envs\mlagents_envs\timers.py", line 305, in wrapped
    return func(*args, **kwargs)
  File "c:\users\daisuke\desktop\ml-agents\ml-agents\mlagents\trainers\trainer_controller.py", line 107, in _reset_env
    self._register_new_behaviors(env_manager, env_manager.first_step_infos)
  File "c:\users\daisuke\desktop\ml-agents\ml-agents\mlagents\trainers\trainer_controller.py", line 268, in _register_new_behaviors
    self._create_trainers_and_managers(env_manager, new_behavior_ids)
  File "c:\users\daisuke\desktop\ml-agents\ml-agents\mlagents\trainers\trainer_controller.py", line 166, in _create_trainers_and_managers
    self._create_trainer_and_manager(env_manager, behavior_id)
  File "c:\users\daisuke\desktop\ml-agents\ml-agents\mlagents\trainers\trainer_controller.py", line 125, in _create_trainer_and_manager
    trainer = self.trainer_factory.generate(brain_name)
  File "c:\users\daisuke\desktop\ml-agents\ml-agents\mlagents\trainers\trainer\trainer_factory.py", line 59, in generate
    trainer_settings = self.trainer_config[behavior_name]
  File "c:\users\daisuke\desktop\ml-agents\ml-agents\mlagents\trainers\settings.py", line 732, in __missing__
    f"The behavior name {key} has not been specified in the trainer configuration. "
mlagents.trainers.exception.TrainerConfigError: The behavior name 3BBrain has not been specified in the trainer configuration. Please add an entry in the configuration file for 3BBrain, or set default_settings.


### 試したこと

エラーコードを検索にかけてみましたが思うようにヒットしませんでした。
ご回答お願い致します。

![](86ffaae278ac7313cb165b52ba93bfb6.png)

Unity ML-Agents エラーコードの意味が分かりませんm(_ _m)

### Unityとml-agentsを用いて機械学習したい
どのように環境構築を設定したら、Unityで学習ができるのでしょうか？

### 発生している問題・エラーメッセージ
mlagents-learn ./config/trainer_config.yaml --run-id=～～～と入力
```
2020-05-28 19:37:39.944015: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cudart64_101.dll'; dlerror: cudart64_101.dll not found
2020-05-28 19:37:39.948579: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
WARNING:tensorflow:From D:\Anaconda\envs\ml-agents\lib\site-packages\tensorflow\python\compat\v2_compat.py:96: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term


                        ▄▄▄▓▓▓▓
                   ╓▓▓▓▓▓▓█▓▓▓▓▓
              ,▄▄▄m▀▀▀'  ,▓▓▓▀▓▓▄                           ▓▓▓  ▓▓▌
            ▄▓▓▓▀'      ▄▓▓▀  ▓▓▓      ▄▄     ▄▄ ,▄▄ ▄▄▄▄   ,▄▄ ▄▓▓▌▄ ▄▄▄    ,▄▄
          ▄▓▓▓▀        ▄▓▓▀   ▐▓▓▌     ▓▓▌   ▐▓▓ ▐▓▓▓▀▀▀▓▓▌ ▓▓▓ ▀▓▓▌▀ ^▓▓▌  ╒▓▓▌
        ▄▓▓▓▓▓▄▄▄▄▄▄▄▄▓▓▓      ▓▀      ▓▓▌   ▐▓▓ ▐▓▓    ▓▓▓ ▓▓▓  ▓▓▌   ▐▓▓▄ ▓▓▌
        ▀▓▓▓▓▀▀▀▀▀▀▀▀▀▀▓▓▄     ▓▓      ▓▓▌   ▐▓▓ ▐▓▓    ▓▓▓ ▓▓▓  ▓▓▌    ▐▓▓▐▓▓
          ^█▓▓▓        ▀▓▓▄   ▐▓▓▌     ▓▓▓▓▄▓▓▓▓ ▐▓▓    ▓▓▓ ▓▓▓  ▓▓▓▄    ▓▓▓▓`
            '▀▓▓▓▄      ^▓▓▓  ▓▓▓       └▀▀▀▀ ▀▀ ^▀▀    `▀▀ `▀▀   '▀▀    ▐▓▓▌
               ▀▀▀▀▓▄▄▄   ▓▓▓▓▓▓,                                      ▓▓▓▓▀
                   `▀█▓▓▓▓▓▓▓▓▓▌
                        ¬`▀▀▀█▓


 Version information:
  ml-agents: 0.16.0,
  ml-agents-envs: 0.16.0,
  Communicator API: 1.0.0,
  TensorFlow: 2.2.0
2020-05-28 19:37:41.755400: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cudart64_101.dll'; dlerror: cudart64_101.dll not found
2020-05-28 19:37:41.760354: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
WARNING:tensorflow:From D:\Anaconda\envs\ml-agents\lib\site-packages\tensorflow\python\compat\v2_compat.py:96: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
2020-05-28 19:37:43 INFO [environment.py:201] Listening on port 5004. Start training by pressing the Play button in the Unity Editor.
```
Unity上でPlayボタンを押す(接続できなかったためモデルを実行？と表示される)
```
Couldn't connect to trainer on port 5004 using API version 1.0.0. Will perform inference instead.
UnityEngine.Debug:Log(Object)
Unity.MLAgents.Academy:InitializeEnvironment() (at C:/Users/kator/OneDrive/ドキュメント/ml-agents-release_1/ml-agents-release_1/com.unity.ml-agents/Runtime/Academy.cs:394)
Unity.MLAgents.Academy:LazyInitialize() (at C:/Users/kator/OneDrive/ドキュメント/ml-agents-release_1/ml-agents-release_1/com.unity.ml-agents/Runtime/Academy.cs:218)
Unity.MLAgents.Academy:.ctor() (at C:/Users/kator/OneDrive/ドキュメント/ml-agents-release_1/ml-agents-release_1/com.unity.ml-agents/Runtime/Academy.cs:206)
Unity.MLAgents.<>c:<.cctor>b__80_0() (at C:/Users/kator/OneDrive/ドキュメント/ml-agents-release_1/ml-agents-release_1/com.unity.ml-agents/Runtime/Academy.cs:78)
System.Lazy`1:get_Value()
Unity.MLAgents.Academy:get_Instance() (at C:/Users/kator/OneDrive/ドキュメント/ml-agents-release_1/ml-agents-release_1/com.unity.ml-agents/Runtime/Academy.cs:93)
Unity.MLAgents.DecisionRequester:Awake() (at C:/Users/kator/OneDrive/ドキュメント/ml-agents-release_1/ml-agents-release_1/com.unity.ml-agents/Runtime/DecisionRequester.cs:49)
```

anaconda promptではタイムアウトとして以下のメッセージが出る。
```
2020-05-28 19:38:43 INFO [subprocess_env_manager.py:191] UnityEnvironment worker 0: environment stopping.
Traceback (most recent call last):
  File "D:\Anaconda\envs\ml-agents\Scripts\mlagents-learn-script.py", line 11, in <module>
    load_entry_point('mlagents', 'console_scripts', 'mlagents-learn')()
  File "c:\users\kator\onedrive\ドキュメント\ml-agents-release_1\ml-agents-release_1\ml-agents\mlagents\trainers\learn.py", line 554, in main
    run_cli(parse_command_line())
  File "c:\users\kator\onedrive\ドキュメント\ml-agents-release_1\ml-agents-release_1\ml-agents\mlagents\trainers\learn.py", line 550, in run_cli
    run_training(run_seed, options)
  File "c:\users\kator\onedrive\ドキュメント\ml-agents-release_1\ml-agents-release_1\ml-agents\mlagents\trainers\learn.py", line 407, in run_training
    tc.start_learning(env_manager)
  File "c:\users\kator\onedrive\ドキュメント\ml-agents-release_1\ml-agents-release_1\ml-agents-envs\mlagents_envs\timers.py", line 305, in wrapped
    return func(*args, **kwargs)
  File "c:\users\kator\onedrive\ドキュメント\ml-agents-release_1\ml-agents-release_1\ml-agents\mlagents\trainers\trainer_controller.py", line 223, in start_learning
    self._reset_env(env_manager)
  File "c:\users\kator\onedrive\ドキュメント\ml-agents-release_1\ml-agents-release_1\ml-agents-envs\mlagents_envs\timers.py", line 305, in wrapped
    return func(*args, **kwargs)
  File "c:\users\kator\onedrive\ドキュメント\ml-agents-release_1\ml-agents-release_1\ml-agents\mlagents\trainers\trainer_controller.py", line 154, in _reset_env
    env.reset(config=sampled_reset_param)
  File "c:\users\kator\onedrive\ドキュメント\ml-agents-release_1\ml-agents-release_1\ml-agents\mlagents\trainers\env_manager.py", line 67, in reset
    self.first_step_infos = self._reset_env(config)
  File "c:\users\kator\onedrive\ドキュメント\ml-agents-release_1\ml-agents-release_1\ml-agents\mlagents\trainers\subprocess_env_manager.py", line 295, in _reset_env
    ew.previous_step = EnvironmentStep(ew.recv().payload, ew.worker_id, {}, {})
  File "c:\users\kator\onedrive\ドキュメント\ml-agents-release_1\ml-agents-release_1\ml-agents\mlagents\trainers\subprocess_env_manager.py", line 92, in recv
    raise env_exception
mlagents_envs.exception.UnityTimeOutException: The Unity environment took too long to respond. Make sure that :
         The environment does not need user interaction to launch
         The Agents are linked to the appropriate Brains
         The environment and the Python interface have compatible versions.
```

### 該当のソースコード

```
サンプルの3dballを動かしているので省略
```

### 試したこと
仮想環境の作り直し
https://note.com/npaka/n/n167b2d03a347?magazine_key=m50f437a3f5e1#gqC2O　を参考に導入
公式ドキュメントの参照（英語で難しかったです）

### 補足情報
ml-agents release1

### その他
バージョンが更新されるたびに内容が大きく変わっているため、情報が私の頭の中で錯乱し訳が分からなくなりました。優しく教えていただけると嬉しいです。

Unity ml-agents 学習できない Couldn't connect

### 実現したいこと


UnityのML-Agentsを使って強化学習がしたい

### 前提



mlagents-learn config\RollerBall.yaml --run-id=firstRunを実行し、Unityの実行ボタンを押すがすぐに実行が止まり、学習が出来ずに以下のメッセージが出る
（同様の質問をされている方について確認しましたがエラーの内容が少し異なり、解決できなかったため質問させていただいております）

また、MarkupSafeが2.1.1より新しくないと駄目だけど私のは2.0.1だから駄目とあるように見えますが
もしこれが原因でしたら推奨される更新方法をご教示いただければ幸いです。何卒宜しくお願いいたします。


### 発生している問題・エラーメッセージ

```
エラーメッセージ

(env) C:\Users\coke8\Downloads\ml-agents-release_19>mlagents-learn ./config/sample/RollerBall.yaml --run-id=RollerBall-1

Traceback (most recent call last):
  File "c:\users\coke8\helloworld\env\lib\site-packages\pkg_resources\__init__.py", line 629, in _build_master
    ws.require(__requires__)
  File "c:\users\coke8\helloworld\env\lib\site-packages\pkg_resources\__init__.py", line 966, in require
    needed = self.resolve(parse_requirements(requirements))
  File "c:\users\coke8\helloworld\env\lib\site-packages\pkg_resources\__init__.py", line 827, in resolve
    dist = self._resolve_dist(
  File "c:\users\coke8\helloworld\env\lib\site-packages\pkg_resources\__init__.py", line 873, in _resolve_dist
    raise VersionConflict(dist, req).with_context(dependent_req)
pkg_resources.ContextualVersionConflict: (MarkupSafe 2.0.1 (c:\users\coke8\helloworld\env\lib\site-packages), Requirement.parse('MarkupSafe>=2.1.1'), {'werkzeug'})

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\coke8\helloworld\env\Scripts\mlagents-learn-script.py", line 33, in <module>
    sys.exit(load_entry_point('mlagents', 'console_scripts', 'mlagents-learn')())
  File "C:\Users\coke8\helloworld\env\Scripts\mlagents-learn-script.py", line 25, in importlib_load_entry_point
    return next(matches).load()
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.8_3.8.2800.0_x64__qbz5n2kfra8p0\lib\importlib\metadata.py", line 77, in load
    module = import_module(match.group('module'))
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.8_3.8.2800.0_x64__qbz5n2kfra8p0\lib\importlib\__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 848, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "c:\users\coke8\downloads\ml-agents-release_19\ml-agents\mlagents\trainers\learn.py", line 2, in <module>
    from mlagents import torch_utils
  File "c:\users\coke8\downloads\ml-agents-release_19\ml-agents\mlagents\torch_utils\__init__.py", line 1, in <module>
    from mlagents.torch_utils.torch import torch as torch  # noqa
  File "c:\users\coke8\downloads\ml-agents-release_19\ml-agents\mlagents\torch_utils\torch.py", line 4, in <module>
    import pkg_resources
  File "c:\users\coke8\helloworld\env\lib\site-packages\pkg_resources\__init__.py", line 3324, in <module>
    def _initialize_master_working_set():
  File "c:\users\coke8\helloworld\env\lib\site-packages\pkg_resources\__init__.py", line 3298, in _call_aside
    f(*args, **kwargs)
  File "c:\users\coke8\helloworld\env\lib\site-packages\pkg_resources\__init__.py", line 3336, in _initialize_master_working_set
    working_set = WorkingSet._build_master()
  File "c:\users\coke8\helloworld\env\lib\site-packages\pkg_resources\__init__.py", line 631, in _build_master
    return cls._build_from_requirements(__requires__)
  File "c:\users\coke8\helloworld\env\lib\site-packages\pkg_resources\__init__.py", line 644, in _build_from_requirements
    dists = ws.resolve(reqs, Environment())
  File "c:\users\coke8\helloworld\env\lib\site-packages\pkg_resources\__init__.py", line 827, in resolve
    dist = self._resolve_dist(
  File "c:\users\coke8\helloworld\env\lib\site-packages\pkg_resources\__init__.py", line 868, in _resolve_dist
    raise DistributionNotFound(req, requirers)
pkg_resources.DistributionNotFound: The 'MarkupSafe>=2.1.1' distribution was not found and is required by werkzeug
```



### 該当のソースコード

```yamlファイルの中身は以下の通りです(入門書通りのつもりではあります)
behaviors:
 RollerBall:
	
	trainer_type: ppo

	
	max_steps: 500000
	time_horizon: 64
	summary_freq: 1000
	keep_checkpoints: 5


	hyperparameters:
		
		batch_size: 10
		buffer_size: 100
		learning_rate: 0.0003
		learning_rate_schedule: linear


		beta: 0.005
		epsilon: 0.2
		lambd: 0.95
		num_epoch: 3

	
	network_settings:
		normalize: true
		hidden_units: 128
		num_layers: 2

	
	reward_signals:
		
		extrinsic:
			gamma: 0.99
			strength: 1.0
```

### 試したこと
メッセージをそのまま調べる
Unity ML-Agentsの入門書を買う(https://www.borndigital.co.jp/book/19053.html)



### 補足情報（FW/ツールのバージョンなど）
absl-py                              1.4.0
attrs                                22.2.0
mlagents                          0.28.0
mlagents-envs                  0.28.0
wheel                                0.40.0
zipp                                 3.15.0

MLagentのダウンロードサイト
https://github.com/Unity-Technologies/ml-agents/releases/tag/release_19