keras-rlのDDPGエージェントが行動空間の最大値、最小値を無視してしまう

前提・実現したいこと

現在、OpenAi Gymで自作の環境を作成しKeras-RLでDDPGを行いたいと考えています。

発生している問題・エラーメッセージ

DDPGが選択する行動がGym環境の行動空間の最大値、最小値を超えた値となる。

該当のソースコード

gym
1self.jackmax=75
2self.nakamax=0.5
3self.action_space = gym.spaces.Box(
4        	np.array([-self.jackmax,-self.jackmax,-self.nakamax,-self.nakamax]),np.array([self.jackmax,self.jackmax,self.nakamax,self.nakamax]),dtype=np.float32)

DDPG
1import numpy as np
2import gym
3
4from keras.models import Sequential, Model
5from keras.layers import Dense, Activation, Flatten, Input, Concatenate
6from keras.optimizers import Adam
7
8import matplotlib.pyplot as plt
9from rl.agents import DDPGAgent
10from rl.memory import SequentialMemory
11from rl.random import OrnsteinUhlenbeckProcess
12
13
14ENV_NAME = 'myenv1-v1'
15# Get the environment and extract the number of actions.
16env = gym.make(ENV_NAME)
17assert len(env.action_space.shape) == 1
18nb_actions = env.action_space.shape[0]
19print(env.action_space.high)
20print(env.action_space.low)
21# Next, we build a very simple model.
22actor = Sequential()
23actor.add(Flatten(input_shape=(1,) + env.observation_space.shape))
24actor.add(Dense(16))
25actor.add(Activation('relu'))
26actor.add(Dense(16))
27actor.add(Activation('relu'))
28actor.add(Dense(16))
29actor.add(Activation('relu'))
30actor.add(Dense(nb_actions))
31actor.add(Activation('linear'))
32print(actor.summary())
33
34action_input = Input(shape=(nb_actions,), name='action_input')
35print(action_input)
36observation_input = Input(shape=(1,) + env.observation_space.shape, name='observation_input')
37flattened_observation = Flatten()(observation_input)
38x = Concatenate()([action_input, flattened_observation])
39x = Dense(32)(x)
40x = Activation('relu')(x)
41x = Dense(32)(x)
42x = Activation('relu')(x)
43x = Dense(32)(x)
44x = Activation('relu')(x)
45x = Dense(1)(x)
46x = Activation('linear')(x)
47critic = Model(inputs=[action_input, observation_input], outputs=x)
48print(critic.summary())
49
50# Finally, we configure and compile our agent. You can use every built-in Keras optimizer and
51# even the metrics!
52memory = SequentialMemory(limit=50000, window_length=1)
53random_process = OrnsteinUhlenbeckProcess(size=nb_actions, theta=.15, mu=0., sigma=.3)
54agent = DDPGAgent(nb_actions=nb_actions, actor=actor, critic=critic, critic_action_input=action_input,
55                  memory=memory, nb_steps_warmup_critic=100, nb_steps_warmup_actor=100,
56                  random_process=random_process, gamma=.99, target_model_update=1e-3)
57agent.compile(Adam(lr=.001, clipnorm=1.), metrics=['mae'])
58
59# Okay, now it's time to learn something! We visualize the training here for show, but this
60# slows down training quite a lot. You can always safely abort the training prematurely using
61# Ctrl + C.
62history=agent.fit(env, nb_steps=50000, visualize=True, verbose=1, nb_max_episode_steps=200)
63history=history.history
64plt.plot(np.arange(len(history["episode_reward"])),history["episode_reward"])
65
66
67# Finally, evaluate our algorithm for 5 episodes.
68agent.test(env, nb_episodes=5, visualize=True, nb_max_episode_steps=200)
69

試したこと

私が期待している行動空間は[-75. -75. -0.5 -0.5]~[75. 75. 0.5 0.5]の範囲なのですが、上記のプログラムを実行した結果、例として[-14413.987, 26360.254, -5294.902, 23021.02 ]といった行動がgymのstep関数の引数として入力されていました。
そこでOpenAi Gymの行動空間の記述が間違えていると思い行動空間の情報を取得しOpenAi Gymのメソッドであるaction_space.sample()でランダム行動をさせました。

getactionspace
1import sys
2
3import gym
4import numpy as np
5
6env = gym.make('myenv1-v1')
7print("action space", env.action_space) # action_spaceのデータ型を表示
8print("action space low",  env.action_space.low) # action_spaceの最小値を表示
9print("action space high", env.action_space.high) # action_spaceの最大値を表示
10print(" ")
11
12observation = env.reset()
13for t in range(5):
14    env.render()  # render game screen
15    action = env.action_space.sample()  # this is random action. replace here to your algorithm!
16    observation, reward, done, info = env.step(action)  # get reward and next scene
17

結果として

action space Box(4,)
action space low [-75.  -75.   -0.5  -0.5]
action space high [75.  75.   0.5  0.5]
 
[ 6.510459   24.0836      0.19969036  0.43461454]
[-6.8775452e+01  7.1954796e+01 -5.0086170e-02 -5.5468518e-02]
[ 4.3696394  -8.961829    0.48525786 -0.33140418]
[24.149748   39.689617   -0.3158655   0.18729909]
[ 62.634468   -11.608932     0.18010212   0.06715964]

となり行動空間内に収まっている事が確認できました。
次に、DDPGの動作を確認するためOpenAi Gymの環境の一つであるPendulum-v0で実行しました。結果としてPendulum-v0の行動空間である-2.0~2.0の範囲の値がstep関数の引数に行動として入力されていました。
原因を究明したいのですがアドバイスをお願いします。