ディープラーニングでのデータセットの読み込みと渡し方(セグメンテーション)

ディープラーニングのセマンティックセグメンテーションにおけるデータの渡し方について質問です。

SegNetを用いて、画像のセグメンテーションを実装してみたいと思いましたが、
ディレクトリからデータを読みこむあたりのコーディング方法がわからないので教えていただきたいです。

フレームワークはkerasを使っています。
ディレクトリ構成は以下のようになっています。

project
   ├ codes
   │   ├ segnet_model.py(モデルを記述したファイル)
   │   └ train.py (実行ファイル)
   └ data
       ├ train_images (元々のデータセットの訓練用画像)
       ├ train_labels (元々のデータセットの訓練用の真値セグメンテーション画像)
       ├ test_images (元々のデータセット内のテスト用画像)
       ├ reshaped_train_images (reshapeしたデータセットの訓練用画像)
       ├ reshaped_label_images (reshapeしたデータセットの訓練用の真値セグメンテーション画像)
       ├ reshaped_val_images (reshapeしたデータセットのバリデーション用画像)
       └ reshaped_val_labels (reshapeしたバリデーション画像の真値セグメンテーション画像)

元々のデータセット内の画像はデータサイズが大きかったので、別プログラムで360×480にreshape、かつ、train用とvalidation用に分割しました。

コードは以下の2つです。

python
1# segnet_model.py
2from keras.layers import Input
3from keras.layers.core import Activation, Flatten, Reshape
4from keras.layers.convolutional import Convolution2D, Conv2D, MaxPooling2D, UpSampling2D
5from keras.layers.normalization import BatchNormalization
6from keras.models import Model
7from keras.utils import np_utils
8
9def SegNet(input_shape=(360, 480, 3), classes=4):
10    ### @ https://github.com/alexgkendall/SegNet-Tutorial/blob/master/Example_Models/bayesian_segnet_camvid.prototxt
11    img_input = Input(shape=input_shape)
12    x = img_input
13    # Encoder
14    x = Conv2D(64, (3, 3), padding="same")(x)
15    x = BatchNormalization()(x)
16    x = Activation("relu")(x)
17    x = MaxPooling2D(pool_size=(2, 2))(x)
18
19    x = Conv2D(128, (3, 3), padding="same")(x)
20    x = BatchNormalization()(x)
21    x = Activation("relu")(x)
22    x = MaxPooling2D(pool_size=(2, 2))(x)
23
24    x = Conv2D(256, (3, 3), padding="same")(x)
25    x = BatchNormalization()(x)
26    x = Activation("relu")(x)
27    x = MaxPooling2D(pool_size=(2, 2))(x)
28
29    x = Conv2D(512, (3, 3), padding="same")(x)
30    x = BatchNormalization()(x)
31    x = Activation("relu")(x)
32
33    # Decoder
34    x = Conv2D(512, (3, 3), padding="same")(x)
35    x = BatchNormalization()(x)
36    x = Activation("relu")(x)
37
38    x = UpSampling2D(size=(2, 2))(x)
39    x = Conv2D(256, (3, 3), padding="same")(x)
40    x = BatchNormalization()(x)
41    x = Activation("relu")(x)
42
43    x = UpSampling2D(size=(2, 2))(x)
44    x = Conv2D(128, (3, 3), padding="same")(x)
45    x = BatchNormalization()(x)
46    x = Activation("relu")(x)
47
48    x = UpSampling2D(size=(2, 2))(x)
49    x = Conv2D(64, (3, 3), padding="same")(x)
50    x = BatchNormalization()(x)
51    x = Activation("relu")(x)
52
53    x = Conv2D(classes, (1, 1), padding="valid")(x)
54    x = Reshape((input_shape[0] * input_shape[1], classes))(x)
55    x = Activation("softmax")(x)
56    model = Model(img_input, x)
57    return model

python
1# train.py(未完成)
2import os
3import glob
4import numpy as np
5import keras
6from segnet_model import SegNet
7from keras.preprocessing.image import ImageDataGenerator
8
9## set gpu usage
10import tensorflow as tf
11config = tf.ConfigProto(gpu_options=tf.GPUOptions(allow_growth=True, per_process_gpu_memory_fraction = 0.8))
12session = tf.Session(config=config)
13keras.backend.tensorflow_backend.set_session(session)
14
15
16def main():
17    input_shape = (360, 480, 3)
18    classes = 4
19    epochs = 100
20    batch_size = 1
21    log_filepath='./logs/'
22
23    data_shape = 360*480
24
25    class_weighting = [1,20,1,200]
26    categories = [('car',[0,0,255]),('pedestrian',[255,0,0]),('lane',[69,47,142]),('signal',[255,255,0])]
27    category_item = ['car', 'pedestrian', 'lane', 'signal']
28    train_datagen = ImageDataGenerator(
29        rescale=1.0 / 255
30        )
31    test_datagen = ImageDataGenerator(rescale=1.0 / 255)
32
33
34    # データの読み込み
35    # このあたりと、この先のmodel.fit()に渡すデータの引数との関係がわからない
36    train_generator = train_datagen.flow_from_directory(
37        'PATH/TO/reshaped_train_images',
38        target_size=(360, 480),
39        color_mode='rgb',
40        batch_size=batch_size,
41        classes=category_item
42        class_mode='input'
43        )
44
45    validation_generator = test_datagen.flow_from_directory(
46        'PATH/TO/reshaped_val_images',
47        target_size=(360, 480),
48        color_mode='rgb',
49        batch_size=batch_size,
50        classes=category_item
51        class_mode='input'
52        )
53
54
55    tb_cb = keras.callbacks.TensorBoard(log_dir=log_filepath, histogram_freq=1, write_graph=True, write_images=True)
56    print("creating model...")
57
58    model = SegNet(input_shape=input_shape, classes=classes)
59    model.compile(loss="categorical_crossentropy", optimizer='adadelta', metrics=["accuracy"])
60
61
62    model.fit(train_generator, train_Y, batch_size=batch_size, epochs=epochs,
63              verbose=1, class_weight=class_weighting , validation_data=(validation_generator, test_Y),
64              shuffle=True, callbacks=[tb_cb])
65  
66
67    model.save('seg.h5')
68
69if __name__ == '__main__':
70    main()
71

わからないのは、train.py内の、ディレクトリからのデータの読み込みに部分と、どのようにmodel.fit()に渡すのか、ということです。上記のtrain.pyでは、真値のセグメンテーション画像を指定してないような気がしています。(train_generatorなどのジェネレータの意味もわかっていません...)
train.pyは自分で作ろうとしている途中のもので、どうコーディングしたら良いかわからずに質問させていただきました。
また、初心者なので、どんなアドバイスでもいただけたら嬉しいです。
お力添えをお願いします。