質問編集履歴

3

2018/07/24 13:33

投稿

trafalbad
trafalbad

スコア303

test CHANGED
File without changes
test CHANGED
@@ -20,11 +20,17 @@
20
20
 
21
21
  追記
22
22
 
23
-
23
+ ```python
24
24
 
25
25
  config = tf.ConfigProto(log_device_placement=True)
26
26
 
27
- sess = tf.Session(config=config) K.set_session(sess)
27
+ sess = tf.Session(config=config)
28
+
29
+
30
+
31
+ K.set_session(sess)
32
+
33
+ ```
28
34
 
29
35
  に変更して、画像サイズ減らす、input関数の画像枚数増やす処理なくせば良いのかなと思うのですが
30
36
 

2

2018/07/24 13:33

投稿

trafalbad
trafalbad

スコア303

test CHANGED
File without changes
test CHANGED
@@ -16,6 +16,18 @@
16
16
 
17
17
  ご教授お願いします
18
18
 
19
+
20
+
21
+ 追記
22
+
23
+
24
+
25
+ config = tf.ConfigProto(log_device_placement=True)
26
+
27
+ sess = tf.Session(config=config) K.set_session(sess)
28
+
29
+ に変更して、画像サイズ減らす、input関数の画像枚数増やす処理なくせば良いのかなと思うのですが
30
+
19
31
  ```python
20
32
 
21
33
  #エラー

1

質問変更

2018/07/24 13:32

投稿

trafalbad
trafalbad

スコア303

test CHANGED
@@ -1 +1 @@
1
- googleの急上昇ワード似た簡易的な検知アルゴリズムについて
1
+ GPUのエラー'OOM when allocating tensor'について
test CHANGED
@@ -1,33 +1,359 @@
1
- googleやYahoo!で急上昇ワードというありま
2
-
3
- あれは文献で見たのですが、複数のアルゴリズムを用いて、普通に作れるものではないことがわかりました。
4
-
5
-
6
-
7
- 自分は検索ワード数の異常検知アルゴリズムを資料(https://www.albert2005.co.jp/knowledge/machine_learning/anomaly_detection_basics/anomaly_detection_time)
8
-
9
- を参考にして作ったのですが、個々ドの急上昇では特定できせん
10
-
11
-
12
-
13
- 上記の資料の異常検知のロジックで急上昇ワードを検知するアルゴリズムとしてはどのようがありますでしょうか?
14
-
15
-
16
-
17
- 自分とては
18
-
19
- ・単語をidと共に組み合わせた、辞書などを使って、一日の各単語を countして各単語の時系列データを作り、上記同様の異常検知アルゴリズムを作る
20
-
21
-
22
-
23
- 感じで考えているのですが?
24
-
25
-
26
-
27
- **・質問:急上昇ワードのようなアルゴリズムで現実的なものとしてどんなものが考えられるでしょうか?**
28
-
29
-
30
-
31
-
32
-
33
- アドバイスや考えなど様々なご意見お願いします
1
+ 質問変更申し訳ありません
2
+
3
+
4
+
5
+ GPUで実行すると下記のエラーが出ます
6
+
7
+
8
+
9
+ 実行環境はAWSp2インスタンスのp2.8xlargeなのメモリが足りないことはないと思うのですが、バッチを8にしてもこエラが出てし
10
+
11
+
12
+
13
+ 何が原因なのでしょうか?
14
+
15
+ ちなみにjupyter上ではなくAWSのEC2のターミナル上で実行しました
16
+
17
+ ご教授お願います
18
+
19
+ ```python
20
+
21
+ #エラー
22
+
23
+ W tensorflow/core/common_runtime/bfc_allocator.cc:279] *************************************************************************************************xxx
24
+
25
+ 2018-07-24 08:58:04.962110: W tensorflow/core/framework/op_kernel.cc:1295] OP_REQUIRES failed at constant_op.cc:75 : Resource exhausted: OOM when allocating tensor of shape [1,1,1088,192] and type float
26
+
27
+ 2018-07-24 08:58:04.962293: E tensorflow/core/common_runtime/executor.cc:660] Executor failed to create kernel. Resource exhausted: OOM when allocating tensor of shape [1,1,1088,192] and type float
28
+
29
+ [[Node: training/SGD/zeros_176 = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [1,1,1088,192] values: [[[0 0 0]]]...>, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]
30
+
31
+ error
32
+
33
+ Traceback (most recent call last):
34
+
35
+ File "Inception_resnet_v2_train.py", line 303, in <module>
36
+
37
+ coord.join(threads)
38
+
39
+ File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/coordinator.py", line 389, in join
40
+
41
+ six.reraise(*self._exc_info_to_raise)
42
+
43
+ File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/six.py", line 693, in reraise
44
+
45
+ raise value
46
+
47
+ File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/queue_runner_impl.py", line 252, in _run
48
+
49
+ enqueue_callable()
50
+
51
+ File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1244, in _single_operation_run
52
+
53
+ self._call_tf_sessionrun(None, {}, [], target_list, None)
54
+
55
+ File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun
56
+
57
+ run_metadata)
58
+
59
+ tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[150,150,3] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
60
+
61
+ [[Node: Cast_1 = Cast[DstT=DT_FLOAT, SrcT=DT_UINT8, _class=["loc:@random_flip_left_right/Switch_1"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](Reshape)]]
62
+
63
+ Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
64
+
65
+
66
+
67
+ [[Node: per_image_standardization/_25 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_58_per_image_standardization", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
68
+
69
+ Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
70
+
71
+ ```
72
+
73
+
74
+
75
+ コード(一部抜粋)
76
+
77
+ ```python
78
+
79
+ #input用の関数
80
+
81
+ from __future__ import print_function
82
+
83
+ from __future__ import absolute_import
84
+
85
+
86
+
87
+ import warnings
88
+
89
+ import time
90
+
91
+ import os
92
+
93
+ import math
94
+
95
+ import numpy as np
96
+
97
+ import tensorflow as tf
98
+
99
+ from keras.optimizers import SGD
100
+
101
+ from keras.callbacks import History
102
+
103
+ from keras.callbacks import Callback
104
+
105
+ from keras.callbacks import ModelCheckpoint
106
+
107
+ from keras.callbacks import TensorBoard
108
+
109
+ from keras.callbacks import CSVLogger
110
+
111
+ from keras import layers
112
+
113
+ from keras.preprocessing import image
114
+
115
+ from keras.models import Model
116
+
117
+ from keras.layers import Activation
118
+
119
+ from keras.layers import AveragePooling2D
120
+
121
+ from keras.layers import BatchNormalization
122
+
123
+ from keras.layers import Concatenate
124
+
125
+ from keras.layers import Conv2D
126
+
127
+ from keras.layers import Dense
128
+
129
+ from keras.layers import GlobalAveragePooling2D
130
+
131
+ from keras.layers import GlobalMaxPooling2D
132
+
133
+ from keras.layers import Input
134
+
135
+ from keras.layers import Lambda
136
+
137
+ from keras.layers import MaxPooling2D
138
+
139
+ from keras.utils.data_utils import get_file
140
+
141
+ from keras.engine.topology import get_source_inputs
142
+
143
+ from keras import backend as K
144
+
145
+ from keras import metrics
146
+
147
+ from keras import utils as np_utils
148
+
149
+ from keras.utils.vis_utils import plot_model, model_to_dot
150
+
151
+ import matplotlib.pyplot as plt
152
+
153
+ from keras.callbacks import EarlyStopping
154
+
155
+ tf.logging.set_verbosity(tf.logging.ERROR)
156
+
157
+
158
+
159
+
160
+
161
+ # In[2]:
162
+
163
+
164
+
165
+
166
+
167
+ from tensorflow.python.client import device_lib
168
+
169
+ device_lib.list_local_devices()
170
+
171
+
172
+
173
+
174
+
175
+ # In[4]:
176
+
177
+
178
+
179
+
180
+
181
+ def input_data(data_dir, batch_size, distort=False):
182
+
183
+
184
+
185
+ num_class = 45
186
+
187
+ filenames = [os.path.join(data_dir, 'train_%d.tfrecords' % i)
188
+
189
+ for i in range(1, 61)]
190
+
191
+ for f in filenames:
192
+
193
+ if not tf.gfile.Exists(f):
194
+
195
+ raise ValueError('Failed to find file: ' + f)
196
+
197
+
198
+
199
+ # Create a queue that produces the filenames to read.
200
+
201
+ filename_queue = tf.train.string_input_producer(filenames)
202
+
203
+ reader = tf.TFRecordReader()
204
+
205
+ _, serialized_example = reader.read(filename_queue)
206
+
207
+
208
+
209
+ features = tf.parse_single_example(serialized_example,
210
+
211
+ features={"label": tf.FixedLenFeature([], tf.int64),
212
+
213
+ "image": tf.FixedLenFeature([], tf.string)})
214
+
215
+
216
+
217
+ label = tf.cast(features["label"], tf.int32)
218
+
219
+ imgin = tf.reshape(tf.decode_raw(features["image"], tf.uint8), tf.stack([150, 150, 3]))
220
+
221
+ float_image = tf.cast(imgin, tf.float32)
222
+
223
+
224
+
225
+ num_preprocess_threads = 16
226
+
227
+ min_fraction_of_examples_in_queue = 0.4
228
+
229
+ NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN = 2900000
230
+
231
+
232
+
233
+ if distort is True:
234
+
235
+ distorted_image = tf.image.random_flip_left_right(float_image)
236
+
237
+
238
+
239
+ distorted_image = tf.image.random_brightness(distorted_image, max_delta=63)
240
+
241
+ distorted_image = tf.image.random_contrast(distorted_image, lower=0.2, upper=1.8)
242
+
243
+ distorted_image = tf.image.per_image_standardization(distorted_image)
244
+
245
+ distorted_image.set_shape([150, 150, 3])
246
+
247
+
248
+
249
+ min_fraction_of_examples_in_queue = 0.4
250
+
251
+ min_queue_examples = int(NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN *
252
+
253
+ min_fraction_of_examples_in_queue)
254
+
255
+ print ('Filling queue with %d CIFAR images before starting to train. '
256
+
257
+ 'This will take a few minutes.' % min_queue_examples)
258
+
259
+
260
+
261
+ images, label_batch = tf.train.shuffle_batch([distorted_image, label], batch_size=batch_size,
262
+
263
+ num_threads=num_preprocess_threads, capacity=min_queue_examples + 3 * batch_size,
264
+
265
+ min_after_dequeue=min_queue_examples)
266
+
267
+
268
+
269
+ else:
270
+
271
+
272
+
273
+ images, label_batch = tf.train.batch([float_image, label], batch_size=batch_size,
274
+
275
+ num_threads=num_preprocess_threads, capacity=min_queue_examples + 3 * batch_size,
276
+
277
+ min_after_dequeue=min_queue_examples)
278
+
279
+
280
+
281
+ return tf.subtract(tf.div(images,127.5), 1.0), tf.one_hot(tf.reshape(label_batch, [batch_size]),num_class)
282
+
283
+
284
+
285
+ #session実行部
286
+
287
+ config = tf.ConfigProto(allow_soft_placement=True)
288
+
289
+ config.gpu_options.allocator_type = 'BFC'
290
+
291
+ config.gpu_options.per_process_gpu_memory_fraction = 0.40
292
+
293
+ config.gpu_options.allow_growth=True
294
+
295
+
296
+
297
+ sess = K.get_session()
298
+
299
+ train_image, train_labels = input_data('/home/ubuntu/train_tf',16, distort=True)
300
+
301
+ input_ = Input(tensor=train_image)
302
+
303
+ output_ = InceptionResNetV2(img_input=input_)
304
+
305
+ train_model = Model(input_, output_, name='inception_resnet_v2')
306
+
307
+ train_model.compile(optimizer=SGD(decay=0.1, momentum=0.9, nesterov=True),
308
+
309
+ loss='categorical_crossentropy',
310
+
311
+ metrics=['accuracy'], target_tensors=[train_labels])
312
+
313
+
314
+
315
+
316
+
317
+ # In[7]:
318
+
319
+
320
+
321
+
322
+
323
+ history = History()
324
+
325
+ callback = []
326
+
327
+ # callbacks.append(ModelCheckpoint(filepath="model.best.h5", save_best_only=True))
328
+
329
+ callback.append(history)
330
+
331
+ callback.append(ModelCheckpoint(filepath="/home/ubuntu/check_dir/model.ep{epoch:02d}.h5"))
332
+
333
+ callback.append(EarlyStopping("loss", patience=1))
334
+
335
+
336
+
337
+ # In[8]:
338
+
339
+ coord = tf.train.Coordinator()
340
+
341
+ threads = tf.train.start_queue_runners(sess, coord)
342
+
343
+ try:
344
+
345
+ history = train_model.fit(epochs=10, steps_per_epoch=int(np.ceil(2900000/16)), callbacks=callback)
346
+
347
+ print(history)
348
+
349
+ except:
350
+
351
+ print('error')
352
+
353
+
354
+
355
+ coord.request_stop()
356
+
357
+ coord.join(threads)
358
+
359
+ ```