やりたいこと
自前で撮影したグレースケールの手指動作の映像データに対して、fine-tuningとLSTMによる動画分類を行いたいのですが、画像の読み込み方法がわからず困っています。
データセットのディレクトリ構造は以下のようになっています。
building,clothes等の35個のディレクトリには、1フレームごとに撮影された画像(100×100)が100枚ずつ入っており、これらを時系列データとして扱いたいです。
モデル構造については、以下のサイトを参考にさせていただきました。
リンク内容
実行環境
・Google Colab TPU
・TensorFlow2.0.0のKerasライブラリ
・Python3.6.9
ソースコード(predict_camera.py)
Python
1import os, sys 2from PIL import Image 3import glob 4import numpy as np 5import tensorflow as tf 6from tensorflow.keras.utils import to_categorical 7from sklearn.model_selection import train_test_split 8import tensorflow.keras.callbacks 9from tensorflow.keras.applications.vgg16 import VGG16 10from tensorflow.keras.models import Model 11from tensorflow.keras.layers import Dense, Input, GlobalAveragePooling2D, LSTM, TimeDistributed 12from tensorflow.keras.optimizers import Nadam 13from tensorflow.keras.callbacks import EarlyStopping 14 15# tpu用 16# 詳細 https://www.tensorflow.org/guide/distributed_training#tpustrategy 17tpu_grpc_url = "grpc://" + os.environ["COLAB_TPU_ADDR"] 18tpu_cluster_resolver = tf.distribute.cluster_resolver.TPUClusterResolver(tpu_grpc_url) 19tf.config.experimental_connect_to_cluster(tpu_cluster_resolver) 20 21tf.compat.v1.disable_v2_behavior() 22tf.compat.v1.disable_eager_execution() 23 24CATEGORIES = 35 25frames = 100 26rows = 100 27columns = 100 28channels = 3 29 30folder = ["00", "01", "02"]#, "03", "04", "05", "06", "07", "08", "09", 31 #"10", "11", "12", "13", "14", "15", "16", "17", "18", "19", 32 #"20", "21", "22", "23", "24", "25", "26", "27", "28", "29", 33 #"30", "31", "32", "33", "34", "35", "36", "37", "38", "39"] 34 35classes = ["building", "clothes", "cooking", "do", "eat", "go", "gohome", 36 "here", "house", "how", "left", "money", "no", "now", 37 "old", "place", "purpose", "rainy", "right", "signlanguage", "study", 38 "sunny", "sushi", "time", "toilet", "tomorrow", "understand", "want", 39 "weather", "what", "when", "which", "who", "why", "you"] 40 41X = [] 42Y = [] 43 44for i, number in enumerate(folder): 45 DIR = "./image/" + number 46 for index, name in enumerate(classes): 47 dir = DIR + "/" + name 48 files = sorted(glob.glob(dir + "/*.png")) 49 F = [] 50 Y.append(index) 51 for i, file in enumerate(files): 52 image = Image.open(file) 53 image = image.convert("RGB") 54 data = np.asarray(image) 55 F.append(data) 56 F = np.array(F).astype(np.float32) 57 F = F / 255.0 58 X.append(F) 59 60X = np.array(X) 61Y = np.array(Y) 62 63Y = to_categorical(Y, 35) 64 65x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.20) 66 67print(x_train.shape) 68print(y_train.shape) 69print(x_test.shape) 70print(y_test.shape) 71 72def build_model(): 73 video = Input(shape=(frames, 74 rows, 75 columns, 76 channels)) 77 cnn_base = VGG16(input_shape=(rows, 78 columns, 79 channels), 80 weights="imagenet", 81 include_top=False) 82 cnn_out = GlobalAveragePooling2D()(cnn_base.output) 83 cnn = Model(inputs=cnn_base.input, outputs=cnn_out) 84 cnn.trainable = False 85 encoded_frames = TimeDistributed(cnn)(video) 86 encoded_sequence = LSTM(256)(encoded_frames) 87 hidden_layer = Dense(1024, activation="relu")(encoded_sequence) 88 outputs = Dense(CATEGORIES, activation="softmax")(hidden_layer) 89 model = Model([video], outputs) 90 optimizer = Nadam(lr=0.002, 91 beta_1=0.9, 92 beta_2=0.999, 93 epsilon=1e-08, 94 schedule_decay=0.004) 95 96 model.compile(loss="categorical_crossentropy", 97 optimizer=optimizer, 98 metrics=["categorical_accuracy"]) 99 return model 100 101model = build_model() 102 103model.summary() 104 105early_stopping = EarlyStopping(patience=2) 106model.fit(x_train,y_train, 107 batch_size=32, 108 epochs=100, 109 verbose=1, 110 validation_split=0.2, 111 shuffle=True, 112 callbacks=[early_stopping]) 113 114evaluation=model.evaluate(x_test, y_test, batch_size=batch, verbose=1) 115 116model.save('camera.hdf5')
現状のプログラムの挙動
Model: "model_1" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= input_1 (InputLayer) [(None, 100, 100, 100, 3) 0 _________________________________________________________________ time_distributed (TimeDistri (None, 100, 512) 14714688 _________________________________________________________________ lstm (LSTM) (None, 256) 787456 _________________________________________________________________ dense (Dense) (None, 1024) 263168 _________________________________________________________________ dense_1 (Dense) (None, 35) 35875 ================================================================= Total params: 15,801,187 Trainable params: 1,086,499 Non-trainable params: 14,714,688 _________________________________________________________________ Train on 67 samples, validate on 17 samples Epoch 1/100 2020-12-18 17:01:29.295779: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 8192000000 exceeds 10% of system memory. tcmalloc: large alloc 8192000000 bytes == 0xc62d4000 @ 0x7fd92effbb6b 0x7fd92f01b379 0x7fd9175cfc27 0x7fd9173c2a7f 0x7fd91728e3cb 0x7fd917254526 0x7fd9172553b3 0x7fd917255583 0x7fd91dec45b1 0x7fd9174f5afc 0x7fd9174e8205 0x7fd9175a8811 0x7fd9175a5f08 0x7fd92d8fb6df 0x7fd92e9dd6db 0x7fd92ed1671f tcmalloc: large alloc 3456065536 bytes == 0x2aef54000 @ 0x7fd92f0191e7 0x7fd91b034ab2 0x7fd91da96e8a 0x7fd91de97282 0x7fd91de98afd 0x7fd91dec089e 0x7fd91dec3d76 0x7fd91dec4837 0x7fd9174f5afc 0x7fd9174e8205 0x7fd9175a8811 0x7fd9175a5f08 0x7fd92d8fb6df 0x7fd92e9dd6db 0x7fd92ed1671f 2020-12-18 17:01:43.445971: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 8192000000 exceeds 10% of system memory. tcmalloc: large alloc 8192000000 bytes == 0x2aef54000 @ 0x7fd92effbb6b 0x7fd92f01b379 0x7fd9175cfc27 0x7fd9173c2a7f 0x7fd91728e3cb 0x7fd917254526 0x7fd9172553b3 0x7fd917255583 0x7fd91dec45b1 0x7fd9174f5afc 0x7fd9174e8205 0x7fd9175a8811 0x7fd9175a5f08 0x7fd92d8fb6df 0x7fd92e9dd6db 0x7fd92ed1671f tcmalloc: large alloc 73728262144 bytes == 0x565bca000 @ 0x7fd92f0191e7 0x7fd91b034ab2 0x7fd91da96e8a 0x7fd91de97282 0x7fd91de98afd 0x7fd91dec089e 0x7fd91dec3d76 0x7fd91dec4837 0x7fd9174f5afc 0x7fd9174e8205 0x7fd9175a8811 0x7fd9175a5f08 0x7fd92d8fb6df 0x7fd92e9dd6db 0x7fd92ed1671f ^C (強制停止)
あなたの回答
tips
プレビュー