SegNet の画像サイズ変更方法

前提・実現したいこと

こちらのサイトに掲載されている SegNet プログラムについての質問です。
前回もSegNetに関する質問を投稿したのですが、そのときは画像データの学習（train.pyの実行）の際にエラーメッセージが出てしまう件についての解決策要求でした。
その件については、「GPUやメモリの性能不足である可能性があり、画像データのサイズを小さくしてはどうか」との助言をいただき、早速データセットフォルダ内の画像ファイルのサイズをすべて小さくし、それに合わせてプログラムコード中の該当する箇所をそのサイズ値に書き換えて実行しました。
その結果、train.py の実行時に以下のようなエラーが出てしまいました。

発生している問題・エラーメッセージ

runfile('C:/Users/t.k/.spyder-py3/train.py', wdir='C:/Users/t.k/.spyder-py3')
loading data...
.Traceback (most recent call last):

  File "<ipython-input-3-6de18f7107f1>", line 1, in <module>
    runfile('C:/Users/t.k/.spyder-py3/train.py', wdir='C:/Users/t.k/.spyder-py3')

  File "C:\Users\t.k\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 704, in runfile
    execfile(filename, namespace)

  File "C:\Users\t.k\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 108, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "C:/Users/t.k/.spyder-py3/train.py", line 59, in <module>
    main()

  File "C:/Users/t.k/.spyder-py3/train.py", line 36, in main
    train_X, train_y = ds.load_data('train') # need to implement, y shape is (None, 360, 480, classes)

  File "C:\Users\t.k.spyder-py3\dataset.py", line 60, in load_data
    label.append(self.one_hot_it(cv2.imread(os.getcwd() + txt[i][1][7:][:-1])[:,:,0]))

  File "C:\Users\t.k.spyder-py3\dataset.py", line 43, in one_hot_it
    x[i,j,labels[i][j]] = 1

IndexError: index 12 is out of bounds for axis 2 with size 12

該当のソースコード

dataset
1import cv2
2import numpy as np
3
4from keras.applications import imagenet_utils
5
6import os
7
8DataPath = './CamVid/'
9data_shape = 176*240
10
11class Dataset:
12    def __init__(self, classes=12, train_file='train.txt', test_file='test.txt'):
13        self.train_file = train_file
14        self.test_file = test_file
15        self.data_shape = 176*240
16        self.classes = classes
17
18    def normalized(self, rgb):
19        #return rgb/255.0
20        norm=np.zeros((rgb.shape[0], rgb.shape[1], 3),np.float32)
21
22        b=rgb[:,:,0]
23        g=rgb[:,:,1]
24        r=rgb[:,:,2]
25
26        norm[:,:,0]=cv2.equalizeHist(b)
27        norm[:,:,1]=cv2.equalizeHist(g)
28        norm[:,:,2]=cv2.equalizeHist(r)
29
30        return norm
31
32    def one_hot_it(self, labels):
33        x = np.zeros([176,240,12])
34        for i in range(176):
35            for j in range(240):
36                x[i,j,labels[i][j]] = 1
37        return x
38
39    def load_data(self, mode='train'):
40        data = []
41        label = []
42        if (mode == 'train'):
43            filename = self.train_file
44        else:
45            filename = self.test_file
46
47        with open(DataPath + filename) as f:
48            txt = f.readlines()
49            txt = [line.split(' ') for line in txt]
50
51        for i in range(len(txt)):
52            data.append(self.normalized(cv2.imread(os.getcwd() + txt[i][0][7:])))
53            label.append(self.one_hot_it(cv2.imread(os.getcwd() + txt[i][1][7:][:-1])[:,:,0]))
54            print('.',end='')
55        #print("train data file", os.getcwd() + txt[i][0][7:])
56        #print("label data raw", cv2.imread(os.getcwd() + '/CamVid/trainannot/0001TP_006690.png'))
57        return np.array(data), np.array(label)
58
59
60    def preprocess_inputs(self, X):
61    ### @ https://github.com/fchollet/keras/blob/master/keras/applications/imagenet_utils.py
62        """Preprocesses a tensor encoding a batch of images.
63        # Arguments
64            x: input Numpy tensor, 4D.
65            data_format: data format of the image tensor.
66            mode: One of "caffe", "tf".
67                - caffe: will convert the images from RGB to BGR,
68                    then will zero-center each color channel with
69                    respect to the ImageNet dataset,
70                    without scaling.
71                - tf: will scale pixels between -1 and 1,
72                    sample-wise.
73        # Returns
74            Preprocessed tensor.
75        """
76        return imagenet_utils.preprocess_input(X)
77
78    def reshape_labels(self, y):
79        return np.reshape(y, (len(y), self.data_shape, self.classes))

model
1from keras.layers import Input
2from keras.layers.core import Activation, Flatten, Reshape
3from keras.layers.convolutional import Convolution2D, Conv2D, MaxPooling2D, UpSampling2D
4from keras.layers.normalization import BatchNormalization
5from keras.models import Model
6from keras.utils import np_utils
7
8def SegNet(input_shape=(176, 240, 3), classes=12):
9    ### @ https://github.com/alexgkendall/SegNet-Tutorial/blob/master/Example_Models/bayesian_segnet_camvid.prototxt
10    img_input = Input(shape=input_shape)
11    x = img_input
12    # Encoder
13    x = Conv2D(64, (3, 3), padding="same")(x)
14    x = BatchNormalization()(x)
15    x = Activation("relu")(x)
16    x = MaxPooling2D(pool_size=(2, 2))(x)
17
18    x = Conv2D(128, (3, 3), padding="same")(x)
19    x = BatchNormalization()(x)
20    x = Activation("relu")(x)
21    x = MaxPooling2D(pool_size=(2, 2))(x)
22
23    x = Conv2D(256, (3, 3), padding="same")(x)
24    x = BatchNormalization()(x)
25    x = Activation("relu")(x)
26    x = MaxPooling2D(pool_size=(2, 2))(x)
27
28    x = Conv2D(512, (3, 3), padding="same")(x)
29    x = BatchNormalization()(x)
30    x = Activation("relu")(x)
31
32    # Decoder
33    x = Conv2D(512, (3, 3), padding="same")(x)
34    x = BatchNormalization()(x)
35    x = Activation("relu")(x)
36
37    x = UpSampling2D(size=(2, 2))(x)
38    x = Conv2D(256, (3, 3), padding="same")(x)
39    x = BatchNormalization()(x)
40    x = Activation("relu")(x)
41
42    x = UpSampling2D(size=(2, 2))(x)
43    x = Conv2D(128, (3, 3), padding="same")(x)
44    x = BatchNormalization()(x)
45    x = Activation("relu")(x)
46
47    x = UpSampling2D(size=(2, 2))(x)
48    x = Conv2D(64, (3, 3), padding="same")(x)
49    x = BatchNormalization()(x)
50    x = Activation("relu")(x)
51
52    x = Conv2D(classes, (1, 1), padding="valid")(x)
53    x = Reshape((input_shape[0] * input_shape[1], classes))(x)
54    x = Activation("softmax")(x)
55    model = Model(img_input, x)
56    return model

train
1import os
2import glob
3import numpy as np
4import keras
5
6from model import SegNet
7
8import dataset
9
10input_shape = (176, 240, 3)
11classes = 12
12epochs = 10
13batch_size = 1
14log_filepath='./logs/'
15
16data_shape = 176*240
17
18class_weighting = [0.2595, 0.1826, 4.5640, 0.1417, 0.5051, 0.3826, 9.6446, 1.8418, 6.6823, 6.2478, 3.0, 7.3614]
19
20## set gpu usage
21import tensorflow as tf
22config = tf.ConfigProto(gpu_options=tf.GPUOptions(allow_growth=True, per_process_gpu_memory_fraction = 0.8))
23session = tf.Session(config=config)
24keras.backend.tensorflow_backend.set_session(session)
25
26def main():
27    print("loading data...")
28    ds = dataset.Dataset(classes=classes)
29    train_X, train_y = ds.load_data('train') # need to implement, y shape is (None, 360, 480, classes)
30
31    train_X = ds.preprocess_inputs(train_X)
32    train_Y = ds.reshape_labels(train_y)
33    print("input data shape...", train_X.shape)
34    print("input label shape...", train_Y.shape)
35
36    test_X, test_y = ds.load_data('test') # need to implement, y shape is (None, 360, 480, classes)
37    test_X = ds.preprocess_inputs(test_X)
38    test_Y = ds.reshape_labels(test_y)
39
40    tb_cb = keras.callbacks.TensorBoard(log_dir=log_filepath, histogram_freq=1, write_graph=True, write_images=True)
41    print("creating model...")
42    model = SegNet(input_shape=input_shape, classes=classes)
43    model.compile(loss="categorical_crossentropy", optimizer='adadelta', metrics=["accuracy"])
44
45    model.fit(train_X, train_Y, batch_size=batch_size, epochs=epochs,
46              verbose=1, class_weight=class_weighting , validation_data=(test_X, test_Y), shuffle=True
47              , callbacks=[tb_cb])
48
49    model.save('seg.h5')
50
51if __name__ == '__main__':
52    main()

試したこと

元々のデータセットの画像サイズ (360×480) から（176×240）へ変更しました。
SegNetにおける画像サイズの変更については、ほかの質問者様が同様の質問を投稿されており、そこにあった解決方法に沿って実行してみましたが無理でした。
（「FastStone Photo Resizer」で色深度8bit(256)まで指定して(176×240)に画像サイズを変更）

testannot

trainannot

valannot

補足情報（FW/ツールのバージョンなど）

開発環境

Windows7 (64bit)
Spyder (Python 3.6)
Keras 2.2.4
Tensorflow 1.12.0

CPU : Intel(R) Core(TM) i7 970 @ 3.20GHz
メモリ（RAM） : 16.0 GB
GPU : NVIDIA GeForce GTX 570

行動規範の内容に同意します

回答2件

ベストアンサー

アップしていただいた画像を下記のコードで確認したところ

Python
1import cv2
2import matplotlib.pyplot as plt
3import numpy as np
4from PIL import Image
5
6pil_img = Image.open("./CamVid/trainannot-test.png")  # PIL で読み込む。アップいただいた画像
7img = np.asarray(pil_img)  # numpy 配列に変換する。
8
9print(img.shape)  # (176, 240)
10print(img)
11
12
13class Label:
14    Sky = 0
15    Building = 1
16    Pole = 2
17    Road = 3
18    Pavement = 4
19    Tree = 5
20    SignSymbol = 6
21    Fence = 7
22    Car = 8
23    Pedestrian = 9
24    Bicyclist = 10
25    Unlabelled = 11
26
27plt.imshow(np.where(img == 3, 100, 0)) # ==　の後の数字を変えるとその領域が表示される Road=3
28plt.axis('off')
29plt.show()
30

例えば img == 0 としてskyを表示させるとsky以外の部分もラベルがついているように
思えますので、この辺りが原因ではないかと思われます。
「FastStone Photo Resizer」で画像サイズを変換する際に「Advanced Options」で
「Resize」は「In Pixels」で(240×176)で「Filter」は「<None>」を選択し、
「Color Depth」は「256(8bit)」で変換してもダメでしょうか。
上記設定で変換するとファイルサイズが2kB前後になりうまく動作しました（アップいただいた
画像は約4kBでした）。

投稿2018/12/26 06:18

hss_

総合スコア39

Teagle333

2018/12/26 07:32

ご返信ありがとうございます。「Advanced Options」で何かのフィルタを設定していたみたいなので、フィルタをはずして再度リサイズしたところ、無事プログラムが動きました。ようやく次に進めそうです。 SegNetに関しては、今後また疑問や問題は出てくるかと思いますが、ひとまずこの件は解決とさせていただきます。本当にありがとうございました。

行動規範の内容に同意します

私も初心者ですのでお力になれるかどうかわかりませんが、上記の「FastStone Photo Resizer」の
解決方法を投稿しております。やはり画像に問題があるのではないかと思いますが、
一度testannot,trainannot,valannotフォルダ内の変換した画像の一つをアップ（挿入）いただけ
ませんでしょうか。確認してみたいと思います。

投稿2018/12/26 01:00

hss_

総合スコア39

Teagle333

2018/12/26 05:41

ご回答くださりありがとうございます。SegNet関連の質問で、自分がつまずいている内容と同じ質問をされているのを見かけ、hss_様の投稿された解決方法を試させていただきました。しかし、記載された手法どうりに行ってもうまくいかず、このような質問をさせていただいた次第です。ご指示のとおり、testannot, trainannot, valannotフォルダ内画像の1枚をそれぞれアップしました。大変恐縮ですが、確認のほうをぜひよろしくお願いいたします。

行動規範の内容に同意します

あなたの回答