前提・実現したいこと
以下のコードはGPUだとaccは0.9以上、val_accは0.6ぐらいまであがるのですが、
TPUだと低いまま途中から変化しなくなります。
ついでにこれ以上バッチサイズはあげられないのですがGPUよりも遅いです。
これはコードに問題があるのでしょうか。
どなたか参考になりそうな情報をご存知の方はいらっしゃいませんか。
よろしくお願いします。
発生している問題・エラーメッセージ
Epoch 1/50 INFO:tensorflow:New input shapes; (re-)compiling: mode=train (# of cores 8), [TensorSpec(shape=(384,), dtype=tf.int32, name='core_id0'), TensorSpec(shape=(384, 128, 128, 3), dtype=tf.float32, name='input_1_10'), TensorSpec(shape=(384, 5), dtype=tf.float32, name='dense_1_target_30')] INFO:tensorflow:Overriding default placeholder. INFO:tensorflow:Remapping placeholder for input_1 INFO:tensorflow:Started compiling INFO:tensorflow:Finished compiling. Time elapsed: 39.066269874572754 secs INFO:tensorflow:Setting weights on TPU model. 1/4 [======>.......................] - ETA: 6:59 - loss: 1.5895 - acc: 0.2354INFO:tensorflow:New input shapes; (re-)compiling: mode=train (# of cores 8), [TensorSpec(shape=(354,), dtype=tf.int32, name='core_id0'), TensorSpec(shape=(354, 128, 128, 3), dtype=tf.float32, name='input_1_10'), TensorSpec(shape=(354, 5), dtype=tf.float32, name='dense_1_target_30')] INFO:tensorflow:Overriding default placeholder. INFO:tensorflow:Remapping placeholder for input_1 INFO:tensorflow:Started compiling INFO:tensorflow:Finished compiling. Time elapsed: 27.03714942932129 secs 3/4 [=====================>........] - ETA: 1:00 - loss: 3.1962 - acc: 0.3590INFO:tensorflow:New input shapes; (re-)compiling: mode=eval (# of cores 8), [TensorSpec(shape=(384,), dtype=tf.int32, name='core_id_10'), TensorSpec(shape=(384, 128, 128, 3), dtype=tf.float32, name='input_1_10'), TensorSpec(shape=(384, 5), dtype=tf.float32, name='dense_1_target_30')] INFO:tensorflow:Overriding default placeholder. INFO:tensorflow:Remapping placeholder for input_1 INFO:tensorflow:Started compiling INFO:tensorflow:Finished compiling. Time elapsed: 19.703600883483887 secs 3/4 [=====================>........] - ETA: 1:02 - loss: 4.8907 - acc: 0.4226INFO:tensorflow:New input shapes; (re-)compiling: mode=eval (# of cores 8), [TensorSpec(shape=(354,), dtype=tf.int32, name='core_id_10'), TensorSpec(shape=(354, 128, 128, 3), dtype=tf.float32, name='input_1_10'), TensorSpec(shape=(354, 5), dtype=tf.float32, name='dense_1_target_30')] INFO:tensorflow:Overriding default placeholder. INFO:tensorflow:Remapping placeholder for input_1 INFO:tensorflow:Started compiling INFO:tensorflow:Finished compiling. Time elapsed: 20.408061981201172 secs 4/4 [==============================] - 257s 64s/step - loss: 4.9464 - acc: 0.4242 4/4 [==============================] - 453s 113s/step - loss: 2.7730 - acc: 0.3726 - val_loss: 4.9464 - val_acc: 0.4242 Epoch 2/50 #文字数制限のため中略 Epoch 8/50 4/4 [==============================] - 256s 64s/step - loss: 8.4696 - acc: 0.4241 4/4 [==============================] - 268s 67s/step - loss: 1.4502 - acc: 0.4241 - val_loss: 8.4696 - val_acc: 0.4241 Epoch 9/50 4/4 [==============================] - 250s 63s/step - loss: 11.8974 - acc: 0.1984 4/4 [==============================] - 262s 65s/step - loss: 1.4516 - acc: 0.4240 - val_loss: 11.8974 - val_acc: 0.1984 Epoch 10/50 4/4 [==============================] - 253s 63s/step - loss: 11.6360 - acc: 0.1995 4/4 [==============================] - 265s 66s/step - loss: 1.4501 - acc: 0.4241 - val_loss: 11.6360 - val_acc: 0.1995 Epoch 11/50 4/4 [==============================] - 252s 63s/step - loss: 10.8887 - acc: 0.2013 4/4 [==============================] - 264s 66s/step - loss: 1.4497 - acc: 0.4241 - val_loss: 10.8887 - val_acc: 0.2013 Epoch 12/50 4/4 [==============================] - 255s 64s/step - loss: 9.6763 - acc: 0.2067 4/4 [==============================] - 266s 67s/step - loss: 1.4494 - acc: 0.4241 - val_loss: 9.6763 - val_acc: 0.2067 Epoch 13/50 4/4 [==============================] - 253s 63s/step - loss: 9.1260 - acc: 0.2096 4/4 [==============================] - 265s 66s/step - loss: 1.4492 - acc: 0.4241 - val_loss: 9.1260 - val_acc: 0.2096 Epoch 14/50 4/4 [==============================] - 254s 64s/step - loss: 5.9425 - acc: 0.2304 4/4 [==============================] - 266s 67s/step - loss: 1.4491 - acc: 0.4241 - val_loss: 5.9425 - val_acc: 0.2304 Epoch 15/50 4/4 [==============================] - 255s 64s/step - loss: 5.1473 - acc: 0.2458 4/4 [==============================] - 266s 67s/step - loss: 1.4494 - acc: 0.4241 - val_loss: 5.1473 - val_acc: 0.2458 Epoch 16/50 4/4 [==============================] - 254s 63s/step - loss: 3.9491 - acc: 0.2666 4/4 [==============================] - 265s 66s/step - loss: 1.4492 - acc: 0.4240 - val_loss: 3.9491 - val_acc: 0.2666 Epoch 17/50 4/4 [==============================] - 255s 64s/step - loss: 2.5912 - acc: 0.3689 4/4 [==============================] - 266s 67s/step - loss: 1.4495 - acc: 0.4241 - val_loss: 2.5912 - val_acc: 0.3689 Epoch 18/50 4/4 [==============================] - 252s 63s/step - loss: 2.1936 - acc: 0.3486 4/4 [==============================] - 264s 66s/step - loss: 1.4489 - acc: 0.4241 - val_loss: 2.1936 - val_acc: 0.3486 Epoch 19/50 4/4 [==============================] - 255s 64s/step - loss: 1.7816 - acc: 0.3842 4/4 [==============================] - 267s 67s/step - loss: 1.4492 - acc: 0.4241 - val_loss: 1.7816 - val_acc: 0.3842 Epoch 20/50 4/4 [==============================] - 258s 65s/step - loss: 1.5642 - acc: 0.4237 4/4 [==============================] - 270s 67s/step - loss: 1.4491 - acc: 0.4241 - val_loss: 1.5642 - val_acc: 0.4237 Epoch 21/50 4/4 [==============================] - 258s 64s/step - loss: 1.4927 - acc: 0.4227 4/4 [==============================] - 270s 67s/step - loss: 1.4491 - acc: 0.4241 - val_loss: 1.4927 - val_acc: 0.4227 Epoch 22/50 4/4 [==============================] - 257s 64s/step - loss: 1.4711 - acc: 0.4227 4/4 [==============================] - 269s 67s/step - loss: 1.4492 - acc: 0.4241 - val_loss: 1.4711 - val_acc: 0.4227 Epoch 23/50 4/4 [==============================] - 257s 64s/step - loss: 1.4529 - acc: 0.4243 4/4 [==============================] - 268s 67s/step - loss: 1.4496 - acc: 0.4240 - val_loss: 1.4529 - val_acc: 0.4243 #ここからほぼ変化無し
該当のソースコード
import numpy as np import pandas as pd import matplotlib.pyplot as plt %config InlineBackend.figure_formats = {'png', 'retina'} from tensorflow.keras.applications import Xception from tensorflow.keras.layers import Dense, GlobalAveragePooling2D from keras.optimizers import Adam, RMSprop, SGD from tensorflow.keras.utils import to_categorical from tensorflow.keras.preprocessing.image import ImageDataGenerator import tensorflow as tf import tensorflow.keras.backend as K from tensorflow.contrib.tpu.python.tpu import keras_support from tensorflow.keras.models import Model,load_model from functools import reduce from PIL import ImageFile ImageFile.LOAD_TRUNCATED_IMAGES = True classes = ["White", "Black", "Asian", "Indian", "Others"] num_classes = len(classes) image_size = 128 from tensorflow.keras.applications import Xception K.clear_session() # ネットワーク定義 net = Xception(include_top=False, weights="imagenet", input_shape=(image_size,image_size,3)) # 最後の5レイヤーまでをフリーズ for layer in net.layers[:-5]: layer.trainable = False x = net.output x = GlobalAveragePooling2D()(x) x = Dense(1024, kernel_regularizer=l2(0.001), activation = 'relu')(x) predictions = Dense(num_classes, activation = 'softmax')(x) model = Model(inputs = net.inputs, outputs = predictions) #108層までfreeze for layer in model.layers[:108]: layer.trainable = False # Batch Normalizationのfreeze解除 if layer.name.startswith('batch_normalization'): layer.trainable = True if layer.name.endswith('bn'): layer.trainable = True #109層以降、学習させる for layer in model.layers[108:]: layer.trainable = True model.compile( optimizer = tf.train.AdamOptimizer(learning_rate=0.01), loss = 'categorical_crossentropy', metrics = ["accuracy"] ) #tpu tpu_grpc_url = "grpc://"+os.environ["COLAB_TPU_ADDR"] tpu_cluster_resolver = tf.contrib.cluster_resolver.TPUClusterResolver(tpu_grpc_url) strategy = keras_support.TPUDistributionStrategy(tpu_cluster_resolver) model = tf.contrib.tpu.keras_to_tpu_model(model, strategy=strategy) datagen = ImageDataGenerator( rescale=1./255, featurewise_center = False, samplewise_center = False, featurewise_std_normalization = False, samplewise_std_normalization = False, zca_whitening = False, rotation_range = 0, width_shift_range = 0.1, height_shift_range = 0.1, horizontal_flip = True, vertical_flip = False ) batch_size=256#TPUの場合2048 train_generator = datagen.flow_from_directory( '/content/data/train1', target_size=(image_size, image_size), batch_size=batch_size, follow_links = True ) validation_generator = datagen.flow_from_directory( '/content/data/validation1', target_size=(image_size, image_size), batch_size=batch_size, follow_links = True ) hist = model.fit_generator( train_generator, epochs = 50, validation_data = validation_generator, verbose = 1, max_queue_size=3, )
あなたの回答
tips
プレビュー