回答編集履歴

え

2018/10/20 12:34

投稿

tiitoi

スコア21956

test CHANGED Viewed

@@ -1,39 +1,527 @@
-> 1. 画像読み込み
-> 1. 1. Train 画像データ(正例と負例) の入力
-> 1. 2. Test 画像データ（正例と負例）の入力
-> 1. 3. Train 画像のデータ水増し
-[ImageDataGenerator](https://keras.io/ja/preprocessing/image/) を使う。
-> 2.データ処理；
-> 2.1. 2クラス分類処理を行うニューラルネットを構築
-> 2.2.Trainデータを使って学習データを作成
-> 2.3.作成した学習データに対し、Testデータを入力し、2クラス分類を実施
-> 3.結果出力；
-> 2.3.Testデータの正例と、2.3.にて出力した結果を比較して精度の算出
-モデルの構築は [過去の質問](https://teratail.com/questions/149156) を参照
-> 成果物；
-> 2クラス分類結果のCSVファイル
-Python の CSV モジュールを使って、CSV で書き出せます。
+> 画像を読み込む際、画像の入ったフォルダを指定して
+全データを読み込むということをしたいのですが、その場合はどうすればよろしいでしょうか。
+以下のようにすればよいです。
+1. glob.glob() でファイル一覧を取得する。
+2. PIL.Image.open() で画像を読み込む。
+# 2クラス分類を Keras でやる手順
+## サンプルをした使用するデータセット
+[Kaggle Cats and Dogs Dataset](https://www.microsoft.com/en-us/download/details.aspx?id=54765)
+解凍すると、以下のようになっている。
+```
+PetImages
+├── Cat: 猫の画像が入っている
+└── Dog: 犬の画像が入っている
+```
+画像を学習用とテスト用に分割する。
+```
+'''
+以下のようなクラスごとにフォルダ分けされたディレクトリ構造を学習用、テスト用に分割する。
+input_dirpath
+├── Cat
+└── Dog
+↓
+dataset
+├── test
+│   ├── cat
+│   └── dog
+└── train
+    ├── cat
+    └── dog
+'''
+import glob
+import os
+import shutil
+from PIL import Image
+from sklearn.model_selection import train_test_split
+# dataset/train/dog/11702.jpg
+input_dirpath = 'PetImages'
+output_dirpath = 'dataset'
+for sub_dirpath in glob.glob(os.path.join(input_dirpath, '*')):
+    class_name = os.path.basename(sub_dirpath).lower()
+    # 出力用のディレクトリを作成する。
+    train_dirpath = os.path.join(output_dirpath, 'train', class_name)
+    test_dirpath = os.path.join(output_dirpath, 'test', class_name)
+    os.makedirs(train_dirpath, exist_ok=True)
+    os.makedirs(test_dirpath, exist_ok=True)
+    # サブディレクトリ配下の画像パスを取得する。
+    img_paths = glob.glob(os.path.join(sub_dirpath, '*.jpg'))
+    print('class {}: {} images found.'.format(class_name, len(img_paths)))
+    # 画像パスを学習用、テスト用に分割する。
+    train_paths, test_paths = train_test_split(img_paths, test_size=0.2)
+    for img_paths, to_dirpath in zip([train_paths, test_paths], [train_dirpath, test_dirpath]):
+        for img_path in img_paths:
+            to_filepath = os.path.join(to_dirpath, os.path.basename(img_path))
+            try:
+                img = Image.open(img_path)
+                img.verify()  # 画像が壊れていないかチェック
+                img._getexif()  # exif の壊れていないかチェック
+                # 問題ない画像ファイルならコピー
+                shutil.copy(img_path, to_filepath)
+                # print('{} ---> {}'.format(img_path, to_filepath))
+            except Exception:
+                # 壊れた画像ファイルはスキップ
+                print('Invalid image found. {}'.format(img_path))
+```
+## 学習済みの ResNet-50 でモデルを作成する。
+fine-tuning でやりたいので、ImageNet で学習済みの ResNet-50 のモデルのあとに全結合層をくっつけて、モデルを作成する。
+```
+import numpy as np
+from keras.applications.resnet50 import ResNet50, preprocess_input, decode_predictions
+from keras.layers import Dense, Flatten
+from keras.models import Model
+from keras.preprocessing import image
+num_classes = 2
+# モデルを作成する。
+# ----------------------------------
+base_model = ResNet50(weights='imagenet', include_top=False,
+                      input_shape=(224, 224, 3))
+# ベースモデルのあとに分類用の全結合層を追加する。
+x = base_model.output
+x = Flatten()(x)
+x = Dense(1000, activation='relu')(x)
+output = Dense(num_classes, activation='softmax')(x)
+model = Model(inputs=base_model.input, outputs=output)
+# fine-tuning なので追加した層以外はフリーズする。(パラメータ更新しない)
+for layer in base_model.layers:
+    layer.trainable = False
+# モデルを可視化
+from keras.utils import plot_model
+# plot_model(model, to_file='model.png')
+# モデルをコンパイルする。
+model.compile(optimizer='rmsprop', loss='categorical_crossentropy',
+              metrics = ['accuracy'])
+```
+## ディレクトリからデータを読み込み、水増しするジェネレーターを作成する。
+以下のディレクトリ構造を想定している。
+```
+dataset
+├── test
+│   ├── cat
+│   └── dog
+└── train
+    ├── cat
+    └── dog
+```
+[ImageDataGenerator](https://keras.io/ja/preprocessing/image/) でディレクトリから画像を読み込みつつ、水増しするジェネレーターを作成する。
+```
+train_dirpath = 'dataset/train'
+test_dirpath = 'dataset/test'
+batch_size = 32
+epochs = 10
+# 学習用の画像生成器を作成する。
+params = {'vertical_flip': True,
+          'horizontal_flip': True,
+          'brightness_range': [0.7, 1.0]}
+# 学習用のジェネレーターを作成する。
+train_datagen = image.ImageDataGenerator(
+    preprocessing_function=preprocess_input, **params)
+train_gen = train_datagen.flow_from_directory(
+    train_dirpath, target_size=model.input_shape[1:3], batch_size=batch_size)
+# テスト用の画像生成器を作成する。(テスト用は水増ししない)
+test_datagen = image.ImageDataGenerator(preprocessing_function=preprocess_input)
+# テスト用のジェネレーターを作成する。
+test_gen = test_datagen.flow_from_directory(
+    test_dirpath, target_size=model.input_shape[1:3], batch_size=batch_size)
+```
+## 学習する。
+```python
+# 学習する。
+history = model.fit_generator(
+    train_gen,
+    steps_per_epoch=len(train_gen) / batch_size,
+    epochs=epochs,
+    validation_data=test_gen,
+    validation_steps=len(test_gen) / batch_size)
+```
+```
+Found 24331 images belonging to 2 classes.
+Found 8925 images belonging to 2 classes.
+Epoch 1/10
+24/23 [==============================] - 10s 413ms/step - loss: 7.4179 - acc: 0.5208 - val_loss: 7.8353 - val_acc: 0.5139
+Epoch 2/10
+24/23 [==============================] - 5s 188ms/step - loss: 6.2514 - acc: 0.6094 - val_loss: 4.3866 - val_acc: 0.7222
+Epoch 3/10
+24/23 [==============================] - 5s 190ms/step - loss: 4.0494 - acc: 0.7435 - val_loss: 1.0163 - val_acc: 0.9340
+Epoch 4/10
+24/23 [==============================] - 5s 219ms/step - loss: 3.1330 - acc: 0.7995 - val_loss: 1.2144 - val_acc: 0.9201
+Epoch 5/10
+24/23 [==============================] - 5s 227ms/step - loss: 4.0351 - acc: 0.7487 - val_loss: 1.1573 - val_acc: 0.9236
+Epoch 6/10
+24/23 [==============================] - 5s 215ms/step - loss: 3.3249 - acc: 0.7930 - val_loss: 1.0633 - val_acc: 0.9340
+Epoch 7/10
+24/23 [==============================] - 5s 228ms/step - loss: 2.6396 - acc: 0.8346 - val_loss: 0.8395 - val_acc: 0.9479
+Epoch 8/10
+24/23 [==============================] - 5s 225ms/step - loss: 2.5717 - acc: 0.8372 - val_loss: 1.1193 - val_acc: 0.9306
+Epoch 9/10
+24/23 [==============================] - 5s 217ms/step - loss: 2.9105 - acc: 0.8190 - val_loss: 1.0074 - val_acc: 0.9375
+Epoch 10/10
+24/23 [==============================] - 5s 216ms/step - loss: 2.5275 - acc: 0.8424 - val_loss: 1.3991 - val_acc: 0.9132
+```
+## 学習過程を可視化する。
+```
+import matplotlib.pyplot as plt
+fig, [ax1, ax2] = plt.subplots(1, 2, figsize=(8, 4))
+epochs = np.arange(1, len(history.history['loss']) + 1)
+# 各エポックの誤差の推移
+ax1.set_title('loss')
+ax1.plot(epochs, history.history['loss'], label='train')
+ax1.plot(epochs, history.history['val_loss'], label='validation')
+ax1.set_xticks(epochs)
+ax1.legend()
+# 各エポックの精度の推移
+ax2.set_title('accuracy')
+ax2.plot(epochs, history.history['acc'], label='train')
+ax2.plot(epochs, history.history['val_acc'], label='validation')
+ax2.set_xticks(epochs)
+ax2.legend()
+plt.show()
+```
+![イメージ説明](6edb4932990a76697570d01b948fda11.png)
+## テストデータを読み込む。
+```python
+import glob
+import os
+from PIL import Image
+from keras.preprocessing import image
+from keras.utils import to_categorical
+label_to_id = test_gen.class_indices
+print(label_to_id)  # {'cat': 0, 'dog': 1}
+x = []
+y = []
+for path in glob.glob(os.path.join(test_dirpath, '*', '*.jpg')):
+    label = os.path.basename(os.path.dirname(path))
+    img = image.load_img(path, target_size=model.input_shape[1:3])
+    img = np.array(img)
+    img = preprocess_input(img)
+    x.append(img)
+    y.append(label_to_id[label])
+x = np.array(x)
+y = np.array(y)
+print('x.shape', x.shape)  # x.shape (8925, 224, 224, 3)
+print('y.shape', y.shape)  # y.shape (8925,)
+```
+## 学習したモデルでテストデータを推論し、精度を算出する。
+```
+y_prob = model.predict(x)
+y_classes = y_prob.argmax(axis=-1)
+from sklearn.metrics import accuracy_score
+accuracy = accuracy_score(y, y_classes)
+print('accuracy: {:.2%}'.format(accuracy))  # accuracy: 92.45%
+```
+92% の精度が出ました。