teratail header banner
teratail header banner
質問するログイン新規登録

質問編集履歴

3

2018/07/24 13:33

投稿

trafalbad
trafalbad

スコア303

title CHANGED
File without changes
body CHANGED
@@ -9,9 +9,12 @@
9
9
  ご教授お願いします
10
10
 
11
11
  追記
12
+ ```python
13
+ config = tf.ConfigProto(log_device_placement=True)
14
+ sess = tf.Session(config=config)
12
15
 
13
- config = tf.ConfigProto(log_device_placement=True)
14
- sess = tf.Session(config=config) K.set_session(sess)
16
+ K.set_session(sess)
17
+ ```
15
18
  に変更して、画像サイズ減らす、input関数の画像枚数増やす処理なくせば良いのかなと思うのですが
16
19
  ```python
17
20
  #エラー

2

2018/07/24 13:33

投稿

trafalbad
trafalbad

スコア303

title CHANGED
File without changes
body CHANGED
@@ -7,6 +7,12 @@
7
7
  何が原因なのでしょうか?
8
8
  ちなみにjupyter上ではなくAWSのEC2のターミナル上で実行しました
9
9
  ご教授お願いします
10
+
11
+ 追記
12
+
13
+ config = tf.ConfigProto(log_device_placement=True)
14
+ sess = tf.Session(config=config) K.set_session(sess)
15
+ に変更して、画像サイズ減らす、input関数の画像枚数増やす処理なくせば良いのかなと思うのですが
10
16
  ```python
11
17
  #エラー
12
18
  W tensorflow/core/common_runtime/bfc_allocator.cc:279] *************************************************************************************************xxx

1

質問変更

2018/07/24 13:32

投稿

trafalbad
trafalbad

スコア303

title CHANGED
@@ -1,1 +1,1 @@
1
- google急上昇ワ似た簡易的な検知アルゴリズムについて
1
+ GPUエラ'OOM when allocating tensor'について
body CHANGED
@@ -1,17 +1,180 @@
1
- googleやYahoo!で急上昇ワードというありま
1
+ 質問変更申し訳ありません
2
- あれは文献で見たのですが、複数のアルゴリズムを用いて、普通に作れるものではないことがわかりました。
3
2
 
4
- 自分は検索ワード数の異常検知アルゴリズムを資料(https://www.albert2005.co.jp/knowledge/machine_learning/anomaly_detection_basics/anomaly_detection_time)
5
- を参考にして作ったのですが、個々ドの急上昇では特定できません。
3
+ GPU実行ると下記エラが出
6
4
 
7
- 上記資料異常検知ロジック急上昇ワードを検知するアルゴズムしてどのようなものがありますでしょうか?
5
+ 実行環境はAWSp2インスタンスp2.8xlargeなのでメモが足りないことはないと思うのですが、バッチを8にしてエラー出てしいま
8
6
 
9
- 自分とては
10
- ・単語をidと共に組合わせた、辞書どを使って、一日各単語を countして各単語時系列デーを作り、上記同様の異常検知アゴリズムを作る
7
+ 何が原因なのでょうか?
8
+ ちなにjupyter上ではくAWSEC2のターミナ上で実行しました
9
+ ご教授お願いします
10
+ ```python
11
+ #エラー
12
+ W tensorflow/core/common_runtime/bfc_allocator.cc:279] *************************************************************************************************xxx
13
+ 2018-07-24 08:58:04.962110: W tensorflow/core/framework/op_kernel.cc:1295] OP_REQUIRES failed at constant_op.cc:75 : Resource exhausted: OOM when allocating tensor of shape [1,1,1088,192] and type float
14
+ 2018-07-24 08:58:04.962293: E tensorflow/core/common_runtime/executor.cc:660] Executor failed to create kernel. Resource exhausted: OOM when allocating tensor of shape [1,1,1088,192] and type float
15
+ [[Node: training/SGD/zeros_176 = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [1,1,1088,192] values: [[[0 0 0]]]...>, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]
16
+ error
17
+ Traceback (most recent call last):
18
+ File "Inception_resnet_v2_train.py", line 303, in <module>
19
+ coord.join(threads)
20
+ File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/coordinator.py", line 389, in join
21
+ six.reraise(*self._exc_info_to_raise)
22
+ File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/six.py", line 693, in reraise
23
+ raise value
24
+ File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/queue_runner_impl.py", line 252, in _run
25
+ enqueue_callable()
26
+ File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1244, in _single_operation_run
27
+ self._call_tf_sessionrun(None, {}, [], target_list, None)
28
+ File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun
29
+ run_metadata)
30
+ tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[150,150,3] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
31
+ [[Node: Cast_1 = Cast[DstT=DT_FLOAT, SrcT=DT_UINT8, _class=["loc:@random_flip_left_right/Switch_1"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](Reshape)]]
32
+ Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
11
33
 
12
- 感じで考えているのですが?
34
+ [[Node: per_image_standardization/_25 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_58_per_image_standardization", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
35
+ Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
36
+ ```
13
37
 
38
+ コード(一部抜粋)
39
+ ```python
40
+ #input用の関数
41
+ from __future__ import print_function
14
- **・質問:急上昇ワードのようなアルゴリズムで現実的なものとしてどんなものが考えられるでしょうか?**
42
+ from __future__ import absolute_import
15
43
 
44
+ import warnings
45
+ import time
46
+ import os
47
+ import math
48
+ import numpy as np
49
+ import tensorflow as tf
50
+ from keras.optimizers import SGD
51
+ from keras.callbacks import History
52
+ from keras.callbacks import Callback
53
+ from keras.callbacks import ModelCheckpoint
54
+ from keras.callbacks import TensorBoard
55
+ from keras.callbacks import CSVLogger
56
+ from keras import layers
57
+ from keras.preprocessing import image
58
+ from keras.models import Model
59
+ from keras.layers import Activation
60
+ from keras.layers import AveragePooling2D
61
+ from keras.layers import BatchNormalization
62
+ from keras.layers import Concatenate
63
+ from keras.layers import Conv2D
64
+ from keras.layers import Dense
65
+ from keras.layers import GlobalAveragePooling2D
66
+ from keras.layers import GlobalMaxPooling2D
67
+ from keras.layers import Input
68
+ from keras.layers import Lambda
69
+ from keras.layers import MaxPooling2D
70
+ from keras.utils.data_utils import get_file
71
+ from keras.engine.topology import get_source_inputs
72
+ from keras import backend as K
73
+ from keras import metrics
74
+ from keras import utils as np_utils
75
+ from keras.utils.vis_utils import plot_model, model_to_dot
76
+ import matplotlib.pyplot as plt
77
+ from keras.callbacks import EarlyStopping
78
+ tf.logging.set_verbosity(tf.logging.ERROR)
16
79
 
80
+
81
+ # In[2]:
82
+
83
+
84
+ from tensorflow.python.client import device_lib
85
+ device_lib.list_local_devices()
86
+
87
+
88
+ # In[4]:
89
+
90
+
91
+ def input_data(data_dir, batch_size, distort=False):
92
+
93
+ num_class = 45
94
+ filenames = [os.path.join(data_dir, 'train_%d.tfrecords' % i)
17
- アドバイスや考えなど様々なご意見お願いします
95
+ for i in range(1, 61)]
96
+ for f in filenames:
97
+ if not tf.gfile.Exists(f):
98
+ raise ValueError('Failed to find file: ' + f)
99
+
100
+ # Create a queue that produces the filenames to read.
101
+ filename_queue = tf.train.string_input_producer(filenames)
102
+ reader = tf.TFRecordReader()
103
+ _, serialized_example = reader.read(filename_queue)
104
+
105
+ features = tf.parse_single_example(serialized_example,
106
+ features={"label": tf.FixedLenFeature([], tf.int64),
107
+ "image": tf.FixedLenFeature([], tf.string)})
108
+
109
+ label = tf.cast(features["label"], tf.int32)
110
+ imgin = tf.reshape(tf.decode_raw(features["image"], tf.uint8), tf.stack([150, 150, 3]))
111
+ float_image = tf.cast(imgin, tf.float32)
112
+
113
+ num_preprocess_threads = 16
114
+ min_fraction_of_examples_in_queue = 0.4
115
+ NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN = 2900000
116
+
117
+ if distort is True:
118
+ distorted_image = tf.image.random_flip_left_right(float_image)
119
+
120
+ distorted_image = tf.image.random_brightness(distorted_image, max_delta=63)
121
+ distorted_image = tf.image.random_contrast(distorted_image, lower=0.2, upper=1.8)
122
+ distorted_image = tf.image.per_image_standardization(distorted_image)
123
+ distorted_image.set_shape([150, 150, 3])
124
+
125
+ min_fraction_of_examples_in_queue = 0.4
126
+ min_queue_examples = int(NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN *
127
+ min_fraction_of_examples_in_queue)
128
+ print ('Filling queue with %d CIFAR images before starting to train. '
129
+ 'This will take a few minutes.' % min_queue_examples)
130
+
131
+ images, label_batch = tf.train.shuffle_batch([distorted_image, label], batch_size=batch_size,
132
+ num_threads=num_preprocess_threads, capacity=min_queue_examples + 3 * batch_size,
133
+ min_after_dequeue=min_queue_examples)
134
+
135
+ else:
136
+
137
+ images, label_batch = tf.train.batch([float_image, label], batch_size=batch_size,
138
+ num_threads=num_preprocess_threads, capacity=min_queue_examples + 3 * batch_size,
139
+ min_after_dequeue=min_queue_examples)
140
+
141
+ return tf.subtract(tf.div(images,127.5), 1.0), tf.one_hot(tf.reshape(label_batch, [batch_size]),num_class)
142
+
143
+ #session実行部
144
+ config = tf.ConfigProto(allow_soft_placement=True)
145
+ config.gpu_options.allocator_type = 'BFC'
146
+ config.gpu_options.per_process_gpu_memory_fraction = 0.40
147
+ config.gpu_options.allow_growth=True
148
+
149
+ sess = K.get_session()
150
+ train_image, train_labels = input_data('/home/ubuntu/train_tf',16, distort=True)
151
+ input_ = Input(tensor=train_image)
152
+ output_ = InceptionResNetV2(img_input=input_)
153
+ train_model = Model(input_, output_, name='inception_resnet_v2')
154
+ train_model.compile(optimizer=SGD(decay=0.1, momentum=0.9, nesterov=True),
155
+ loss='categorical_crossentropy',
156
+ metrics=['accuracy'], target_tensors=[train_labels])
157
+
158
+
159
+ # In[7]:
160
+
161
+
162
+ history = History()
163
+ callback = []
164
+ # callbacks.append(ModelCheckpoint(filepath="model.best.h5", save_best_only=True))
165
+ callback.append(history)
166
+ callback.append(ModelCheckpoint(filepath="/home/ubuntu/check_dir/model.ep{epoch:02d}.h5"))
167
+ callback.append(EarlyStopping("loss", patience=1))
168
+
169
+ # In[8]:
170
+ coord = tf.train.Coordinator()
171
+ threads = tf.train.start_queue_runners(sess, coord)
172
+ try:
173
+ history = train_model.fit(epochs=10, steps_per_epoch=int(np.ceil(2900000/16)), callbacks=callback)
174
+ print(history)
175
+ except:
176
+ print('error')
177
+
178
+ coord.request_stop()
179
+ coord.join(threads)
180
+ ```