前提・実現したいこと
Gym-retro環境下のGradiusをkeras-rlのDQNAgentを用いて学習させようとしています。
発生している問題・エラーメッセージ
リワードが伸びず、lossが異常なほどに膨れ上がってしまいます。
chack _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d_1 (Conv2D) (None, 32, 30, 28) 8224 _________________________________________________________________ conv2d_2 (Conv2D) (None, 16, 15, 64) 28736 _________________________________________________________________ conv2d_3 (Conv2D) (None, 16, 15, 64) 36928 _________________________________________________________________ flatten_1 (Flatten) (None, 15360) 0 _________________________________________________________________ dense_1 (Dense) (None, 256) 3932416 _________________________________________________________________ dense_2 (Dense) (None, 36) 9252 ================================================================= Total params: 4,015,556 Trainable params: 4,015,556 Non-trainable params: 0 _________________________________________________________________ None Training for 1500000 steps ... 2339/1500000: episode: 1, duration: 47.685s, episode steps: 2339, steps per second: 49, episode reward: 2500.000, mean reward: 1.069 [0.000, 100.000], mean action: 19.122 [0.000, 35.000], mean observation: 0.029 [0.000, 0.980], loss: 36.018083, mean_absolute_error: 11.380395, mean_q: 18.252860 3936/1500000: episode: 2, duration: 51.391s, episode steps: 1597, steps per second: 31, episode reward: 1800.000, mean reward: 1.127 [0.000, 100.000], mean action: 19.312 [0.000, 35.000], mean observation: 0.027 [0.000, 0.980], loss: 64.386497, mean_absolute_error: 54.420486, mean_q: 68.424599 6253/1500000: episode: 3, duration: 75.020s, episode steps: 2317, steps per second: 31, episode reward: 3500.000, mean reward: 1.511 [0.000, 100.000], mean action: 16.931 [0.000, 35.000], mean observation: 0.029 [0.000, 0.980], loss: 177.966461, mean_absolute_error: 153.478119, mean_q: 177.061630 #中略 1493035/1500000: episode: 525, duration: 95.634s, episode steps: 2823, steps per second: 30, episode reward: 5100.000, mean reward: 1.807 [0.000, 500.000], mean action: 19.664 [0.000, 35.000], mean observation: 0.034 [0.000, 0.980], loss: 26501204410368.000000, mean_absolute_error: 86211024.000000, mean_q: 90254256.000000 1495350/1500000: episode: 526, duration: 78.401s, episode steps: 2315, steps per second: 30, episode reward: 2500.000, mean reward: 1.080 [0.000, 100.000], mean action: 18.652 [0.000, 34.000], mean observation: 0.029 [0.000, 0.980], loss: 23247718449152.000000, mean_absolute_error: 84441184.000000, mean_q: 88424568.000000 1497839/1500000: episode: 527, duration: 84.667s, episode steps: 2489, steps per second: 29, episode reward: 3700.000, mean reward: 1.487 [0.000, 500.000], mean action: 21.676 [0.000, 35.000], mean observation: 0.034 [0.000, 0.980], loss: 23432217493504.000000, mean_absolute_error: 80286264.000000, mean_q: 83946064.000000 done, took 49517.509 seconds end!
該当のソースコード
python
1#import all i need 2 3import retro 4import keras as k 5import numpy as np 6import rl 7import rl.memory 8import rl.policy 9import rl.agents.dqn 10import rl.core 11import sys 12import gym 13from PIL import Image 14 15import tensorflow as tf 16from keras.backend import tensorflow_backend 17 18config = tf.ConfigProto(gpu_options=tf.GPUOptions(allow_growth=True)) 19session = tf.Session(config=config) 20tensorflow_backend.set_session(session) 21 22#set window size 23win_size = (112,120) 24 25#set log file 26#fo = open('log.txt', 'w') 27#sys.stdout = fo 28 29from tensorflow.python.client import device_lib 30device_lib.list_local_devices() 31 32class CustomProcessor(rl.core.Processor): 33 34 def process_observation(self, observation): 35 img = Image.fromarray(observation) 36 img = img.resize(win_size).convert('L') 37 tes = np.array(img) / 255 38 return np.array(img) / 255 39 40 41 #def process_state_batch(self, batch): 42 #batch = batch.transpose(0,2,3,1) 43 #print(batch.shape) 44 #return batch 45 46 47myprocessor = CustomProcessor() 48 49""" 50Gradius have action space which can take 9 action in same moment. 51so i gotta discrete action space. 52the way i'd taken is wrapping env class. 53""" 54 55class Discretizer(gym.ActionWrapper): 56 57 def __init__(self, env): 58 super(Discretizer, self).__init__(env) 59 self._actions = [[0,0,0,0,0,0,0,0,0], 60 [0,0,0,0,0,0,0,0,1], 61 [1,0,0,0,0,0,0,0,0], 62 [1,0,0,0,0,0,0,0,1], 63 [0,0,0,0,1,0,0,0,0], 64 [0,0,0,0,0,1,0,0,0], 65 [0,0,0,0,0,0,1,0,0], 66 [0,0,0,0,0,0,0,1,0], 67 [0,0,0,0,1,0,1,0,0], 68 [0,0,0,0,1,0,0,1,0], 69 [0,0,0,0,0,1,0,1,0], 70 [0,0,0,0,0,1,1,0,0],] 71 for i in range(8): 72 self._actions.append((np.array(self._actions[1]) + np.array(self._actions[i + 4])).tolist()) 73 for i in range(8): 74 self._actions.append((np.array(self._actions[2]) + np.array(self._actions[i + 4])).tolist()) 75 for i in range(8): 76 self._actions.append((np.array(self._actions[3]) + np.array(self._actions[i + 4])).tolist()) 77 self.actions = [] 78 for action in self._actions: 79 env.get_action_meaning(action) 80 self.action_space = gym.spaces.Discrete(len(self._actions)) 81 82 def action(self, a): 83 return self._actions[a].copy() 84 85env = retro.make(game="Gradius-Nes", record="./Record") 86env = Discretizer(env) 87 88import retro 89import keras as k 90import numpy as np 91import rl 92import rl.memory 93import rl.policy 94import rl.agents.dqn 95import rl.core 96 97import tensorflow as tf 98from keras.backend import tensorflow_backend 99 100config = tf.ConfigProto(gpu_options=tf.GPUOptions(allow_growth=True)) 101session = tf.Session(config=config) 102tensorflow_backend.set_session(session) 103 104nb_actions = env.action_space.n 105 106normal = k.initializers.glorot_normal() 107model = k.Sequential() 108win_len = 4 109model.add(k.layers.Conv2D( 110 32, kernel_size=8, strides=4, padding="same", 111 input_shape=(4,120,112), kernel_initializer=normal, 112 activation="relu", data_format='channels_first')) 113print("chack") 114model.add(k.layers.Conv2D( 115 64, kernel_size=4, strides=2, padding="same", 116 kernel_initializer=normal, 117 activation="relu")) 118model.add(k.layers.Conv2D( 119 64, kernel_size=3, strides=1, padding="same", 120 kernel_initializer=normal, 121 activation="relu")) 122model.add(k.layers.Flatten()) 123model.add(k.layers.Dense(256, kernel_initializer=normal, 124 activation="relu")) 125model.add(k.layers.Dense(nb_actions, 126 kernel_initializer=normal, 127 activation="linear")) 128 129memory = rl.memory.SequentialMemory(limit=50000, window_length=win_len) 130policy = rl.policy.EpsGreedyQPolicy() 131 132""" 133dqn = rl.agents.DQNAgent(processor=myprocessor, model=model, nb_actions=nb_actions, memory=memory, nb_steps_warmup=10, 134 target_model_update=1e-2, policy=policy) 135""" 136dqn = rl.agents.DQNAgent(processor=myprocessor, model=model, nb_actions=nb_actions, memory=memory, nb_steps_warmup=1000, 137 target_model_update=1e-2, policy=policy) 138 139dqn.compile(k.optimizers.Adam(lr=1e-3), metrics=['mae']) 140print(model.summary()); 141hist = dqn.fit(env, nb_steps=1500000, visualize=False, verbose=2) 142print("end!") 143dqn.save_weights("test_model.h5f", overwrite=True) 144 145env.close()
試したこと
過去4フレーム分の画面をうまく取り込めていないかを疑い、CustomProcessorでnp.transposeを用いて配列の順序を変えて試してもみましたが、lossは依然として増えるばかりで効果がありませんでした。
今も畳み込み層に何か問題があるのではないかと疑っています。
補足情報(FW/ツールのバージョンなど)
ssh接続で外部のサーバに接続し、その環境下で動作させています。
以下にpip freezeで取得したサーバに導入されているパッケージの一覧とversionを開示します。
absl-py==0.7.1 alembic==1.0.10 asn1crypto==0.24.0 astor==0.8.0 async-generator==1.10 attrs==19.1.0 backcall==0.1.0 bleach==3.1.0 certifi==2019.3.9 certipy==0.1.3 cffi==1.12.3 chardet==3.0.4 cloudpickle==1.2.1 cryptography==2.6.1 cycler==0.10.0 decorator==4.4.0 defusedxml==0.6.0 EasyProcess==0.2.7 entrypoints==0.3 future==0.17.1 gast==0.2.2 google-pasta==0.1.7 grpcio==1.21.1 gym==0.13.0 gym-retro==0.7.0 h5py==2.9.0 idna==2.8 ipykernel==5.1.0 ipython==7.5.0 ipython-genutils==0.2.0 jedi==0.13.3 Jinja2==2.10.1 jsonschema==3.0.1 jupyter-client==5.2.4 jupyter-core==4.4.0 jupyterhub==1.0.0 jupyterhub-ldapauthenticator==1.2.2 jupyterlab==0.35.6 jupyterlab-server==0.2.0 Keras==2.2.4 Keras-Applications==1.0.8 Keras-Preprocessing==1.1.0 kiwisolver==1.1.0 ldap3==2.6 Mako==1.0.10 Markdown==3.1.1 MarkupSafe==1.1.1 matplotlib==3.0.3 mistune==0.8.4 nbconvert==5.5.0 nbformat==4.4.0 notebook==5.7.8 numpy==1.16.4 oauthlib==3.0.1 pamela==1.0.0 pandocfilters==1.4.2 parso==0.4.0 pexpect==4.7.0 pickleshare==0.7.5 pipenv==2018.11.26 prometheus-client==0.6.0 prompt-toolkit==2.0.9 protobuf==3.8.0 ptyprocess==0.6.0 pyasn1==0.4.5 pycparser==2.19 pycurl==7.43.0 pyglet==1.3.2 Pygments==2.4.0 pygobject==3.20.0 pyOpenSSL==19.0.0 pyparsing==2.4.0 pyrsistent==0.15.2 python-apt==1.1.0b1+ubuntu0.16.4.2 python-dateutil==2.8.0 python-editor==1.0.4 PyVirtualDisplay==0.2.4 PyYAML==5.1.1 pyzmq==18.0.1 requests==2.21.0 scipy==1.3.0 Send2Trash==1.5.0 six==1.12.0 SQLAlchemy==1.3.3 tensorboard==1.14.0 tensorflow==1.14.0 tensorflow-estimator==1.14.0 tensorflow-gpu==1.14.0 termcolor==1.1.0 terminado==0.8.2 testpath==0.4.2 tornado==6.0.2 traitlets==4.3.2 unattended-upgrades==0.1 urllib3==1.24.3 virtualenv==16.5.0 virtualenv-clone==0.5.3 wcwidth==0.1.7 webencodings==0.5.1 Werkzeug==0.15.4 wrapt==1.11.2 xvfbwrapper==0.2.9
あなたの回答
tips
プレビュー