前提・実現したいこと
機械学習初心者で機械学習に関する勉強を行っています。
現在CNNとLSTMを組み合わせた25単語の読唇モデルをkerasで構築しているのですが、認識率が5%程と、かなり低い状態です。
この原因がコーディングミスで生じているのか、純粋にモデルの相性等でこのような結果になっているのか分からない状態です。
どなたか分かる方がいらっしゃれば教えていただきたいです。
該当のソースコード
Python
1timesteps = 1 # input frame numbers for LSTM 2n_labels = 25 # Number of Dataset Labels 3Learning_rate = 0.0001 # Oprimizers lr, in this case, for adam 4batch_size = 32 5num_epochs = 1 6DATA_PATH = "LFROI" 7img_channel = 3 # RGB 8image_size=32 9 10def load_images(dir_name): 11 file_list = os.listdir(dir_name) 12 13 frame_num = len(file_list) 14 15 if frame_num < timesteps: 16 dframe = timesteps - frame_num 17 iframe = round(dframe / 2) 18 fframe = dframe - iframe 19 iframes = [1 for x in range(iframe)] 20 nframes = [x + 1 for x in range(frame_num)] 21 fframes = [frame_num for x in range(fframe)] 22 frames = iframes + nframes + fframes 23 else: 24 frames = [round(x * frame_num / timesteps + 1) for x in range(timesteps)] 25 frame_array = [] 26 27 for i in range(timesteps): 28 image_name = os.path.join(dir_name, str(frames[i]).zfill(5) + ".jpg") 29 img = cv2.imread(image_name) 30 if img is None: 31 print("ERROR: can not read image : ", image_name) 32 else: 33 img = cv2.resize(img, (image_size, image_size)) 34 frame_array.append(img) 35 36 return np.array(frame_array) 37 38def load_data(list_file): 39 file_num = sum(1 for line in open(list_file)) 40 X = [] 41 labels = [] 42 pbar = tqdm(total=file_num) 43 44 for line in open(list_file, "r"): 45 temp = line.split() 46 file_name = temp[0] 47 label = temp[1] 48 pbar.update(1) 49 dir_name = os.path.join(DATA_PATH, file_name) 50 labels.append(int(label)) 51 X.append(load_images(dir_name)) 52 pbar.close() 53 54 return np.array(X), labels 55 56print("loading training data...") 57x_train, y_train = load_data("…/training_LF-ROI.txt") 58print("loading test data...") 59x_test, y_test = load_data("…/test_LF-ROI.txt") 60 61 62X_train = x_train.reshape((x_train.shape[0],timesteps, image_size, image_size, img_channel)) 63X_test = x_test.reshape((x_test.shape[0], timesteps,image_size, image_size,img_channel)) 64Y_train = np_utils.to_categorical(y_train, n_labels) 65Y_test = np_utils.to_categorical(y_test, n_labels) 66X_train = X_train.astype("float32") 67X_test = X_test.astype("float32") 68 69print("X_shape:{}\nY_shape:{}".format(X_train.shape, Y_train.shape)) 70print("X_shape:{}\nY_shape:{}".format(X_test.shape, Y_test.shape)) 71 72video = Input(shape=(timesteps,image_size,image_size,img_channel)) 73model = applications.MobileNet(input_shape=(image_size,image_size,img_channel), weights="imagenet", include_top=False) 74model.trainable = False 75x = model.output 76x = Flatten()(x) 77x = Dense(1024, activation="relu")(x) 78x = Dropout(0.3)(x) 79cnn_out = Dense(128, activation="relu")(x) 80Lstm_inp = Model(inputs=model.input, outputs=cnn_out) 81encoded_frames = TimeDistributed(Lstm_inp)(video) 82encoded_sequence = LSTM(256)(encoded_frames) 83hidden_Drop = Dropout(0.3)(encoded_sequence) 84hidden_layer = Dense(128, activation="relu")(encoded_sequence) 85outputs = Dense(n_labels, activation="softmax")(hidden_layer) 86model = Model([video], outputs) 87 88adam = keras.optimizers.Adam(lr=Learning_rate, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0, amsgrad=False) 89model.compile(loss="categorical_crossentropy", optimizer=adam, metrics=["accuracy"]) 90 91hist = model.fit(X_train, Y_train, batch_size=batch_size, validation_data=(X_test, Y_test), shuffle=True, epochs=num_epochs)
補足情報(FW/ツールのバージョンなど)
Microsoft Visual Studio2017
tensorflow 2.4.1
keras2.4.3
Python 3.6.13
あなたの回答
tips
プレビュー