model.fitがValueErrorになる

実現したいこと

オートエンコーダを使えるようになりたい。
エラーの意味が理解できるようになりたい。

発生している問題・分からないこと

python初心者です。
以下のサイトを見て異常検知AIを作ろうとしたのですが以下のエラーメッセージが発生しました。
https://qiita.com/michelle0915/items/28bc5b844bd0d7ab597b
学習用の画像として大きさ300×225のRGB画像を使用したいです。

エラーメッセージ

error
1(56, 225, 300, 3)
2Epoch 1/50
3---------------------------------------------------------------------------
4ValueError                                Traceback (most recent call last)
5Cell In[4], line 76
6     74 # 学習実行
7     75 print(train.shape)
8---> 76 model.fit(
9     77     train,
10     78     train,
11     79     batch_size=BATCH_SIZE,
12     80     epochs=EPOCHS
13     81 )
14
15File ~\anaconda3\Lib\site-packages\keras\src\utils\traceback_utils.py:122, in filter_traceback.<locals>.error_handler(*args, **kwargs)
16    119     filtered_tb = _process_traceback_frames(e.__traceback__)
17    120     # To get the full stack trace, call:
18    121     # `keras.config.disable_traceback_filtering()`
19--> 122     raise e.with_traceback(filtered_tb) from None
20    123 finally:
21    124     del filtered_tb
22
23Cell In[4], line 70, in r_loss(y_true, y_pred)
24     69 def r_loss(y_true, y_pred):
25---> 70   return K.mean(K.square(y_true - y_pred), axis=[1,2,3])
26
27ValueError: Dimensions must be equal, but are 225 and 228 for '{{node compile_loss/r_loss/sub}} = Sub[T=DT_FLOAT](data_1, functional_23_1/functional_21_1/activation_3_1/Sigmoid)' with input shapes: [1,225,300,3], [1,228,300,3].
28
29

該当のソースコード

from tensorflow.keras.layers import Input, Conv2D, Flatten, Dense, Conv2DTranspose, Reshape, Activation, LeakyReLU
from tensorflow.keras.models import Model
from tensorflow.keras import backend as K
from tensorflow.keras.optimizers import Adam
import numpy as np
import cv2
import glob

# 学習データの読み込み＆前処理
train_images = glob.glob('train_image/*')
train = []
for i in train_images:
    image = cv2.imread(i)
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    train.append(image)

train = np.array(train)
train = train.astype('float32') / 255

# 学習用ハイパーパラメータ
LEARNING_RATE = 0.0005
BATCH_SIZE = 8
Z_DIM = 100
EPOCHS = 50

# エンコーダ
encoder_input = Input(shape=(225,300,3), name='encoder_input')
x = encoder_input
x = Conv2D(filters=32, kernel_size=3, strides=1, padding='same', name='encoder_conv_0')(x)
x = LeakyReLU()(x)
x = Conv2D(filters=32, kernel_size=3, strides=1, padding='same', name='encoder_conv_0_1')(x)
x = LeakyReLU()(x)
x = Conv2D(filters=64, kernel_size=3, strides=2, padding='same', name='encoder_conv_1')(x)
x = LeakyReLU()(x)
x = Conv2D(filters=64, kernel_size=3, strides=2, padding='same', name='encoder_conv_2')(x)
x = LeakyReLU()(x)
x = Conv2D(filters=64, kernel_size=3, strides=1, padding='same', name='encoder_conv_3')(x)
x = LeakyReLU()(x)
shape_before_flattening = K.int_shape(x)[1:]
x = Flatten()(x)
encoder_output = Dense(Z_DIM, name='encoder_output')(x)
encoder = Model(encoder_input, encoder_output)

# デコーダ
decoder_input = Input(shape=(Z_DIM,), name='decoder_input')
x = Dense(np.prod(shape_before_flattening))(decoder_input)
x = Reshape(shape_before_flattening)(x)
x = Conv2DTranspose(filters=64, kernel_size=3, strides=1, padding='same', name='decoder_conv_t_0')(x)
x = LeakyReLU()(x)
x = Conv2DTranspose(filters=64, kernel_size=3, strides=2, padding='same', name='decoder_conv_t_1')(x)
x = LeakyReLU()(x)
x = Conv2DTranspose(filters=32, kernel_size=3, strides=2, padding='same', name='decoder_conv_t_2')(x)
x = LeakyReLU()(x)
x = Conv2DTranspose(filters=32, kernel_size=3, strides=1, padding='same', name='decoder_conv_t_2_5')(x)
x = LeakyReLU()(x)
x = Conv2DTranspose(filters=3, kernel_size=3, strides=1, padding='same', name='decoder_conv_t_3')(x)
x = Activation('sigmoid')(x)
decoder_output = x
decoder = Model(decoder_input, decoder_output)

# エンコーダ/デコーダ連結
model_input = encoder_input
model_output = decoder(encoder_output)
model = Model(model_input, model_output)

# 学習用設定設定（最適化関数、損失関数）
optimizer = Adam(learning_rate=LEARNING_RATE)

def r_loss(y_true, y_pred):
  return K.mean(K.square(y_true - y_pred), axis=[1,2,3])

model.compile(optimizer=optimizer, loss=r_loss)

# 学習実行
print(train.shape)
model.fit(
    train,
    train,
    batch_size=BATCH_SIZE,
    epochs=EPOCHS
)

試したこと・調べたこと

teratailやGoogle等で検索した
ソースコードを自分なりに変更した
知人に聞いた
その他

上記の詳細・結果

サイトでは300×300の画像を使用しているので自分が使用したい画像（幅300×高さ225)に合わせて変更しました。

補足

opencv-python:4.9.0.80
tensorflow:2.16.1
anaconda3 jupyter notebook

meg_

2024/03/31 08:15

「train」の次元はどうなっているのでしょうか？

退会済みユーザー

2024/03/31 08:57 編集

trainの次元を確認するためにほかの質問（https://teratail.com/questions/5p7neqw7sd3uex）　の回答にあった「「model.fit(...」のすぐ上にprint(train.shape)を追加して実行して、「train」(学習データ)の形状を確認してみてください」を実行しているのですが、結果が出なくて困ってます。

meg_

2024/03/31 09:03

> 結果が出なくて困ってます。その場合でも質問と同じエラー(fitでのValueError)は発生するのでしょうか？

退会済みユーザー

2024/03/31 11:28 編集

画像の幅300×縦225なので順番変えたり、model.fit(...」のすぐ上にprint(train.shape)を追加して実行していますが、下記のエラーが出ます。

退会済みユーザー

2024/03/31 11:11 編集

プログラム

meg_

2024/03/31 10:44

> print(train.shape)＃ここに入れてますが出力されず (0,)　というのが出力結果ではありませんか？

退会済みユーザー

2024/03/31 11:22 編集

再度プログラムを確認したらフォルダ名が誤っていました。大変失礼しました。修正して実行を行ったら、再度エラーが出ました。

meg_

2024/03/31 11:12

この欄では見にくいので質問に追記しましょう。（質問は編集可能です）

退会済みユーザー

2024/03/31 11:27 編集

申し訳ございません。この次にエラーコードを記載します。

退会済みユーザー

2024/03/31 11:27

(56, 225, 300, 3) Epoch 1/50 --------------------------------------------------------------------------- ValueError Traceback (most recent call last) Cell In[1], line 76 74 # 学習実行 75 print(train.shape) ---> 76 model.fit( 77 train, 78 train, 79 batch_size=BATCH_SIZE, 80 epochs=EPOCHS 81 ) File ~\anaconda3\Lib\site-packages\keras\src\utils\traceback_utils.py:122, in filter_traceback.<locals>.error_handler(*args, **kwargs) 119 filtered_tb = _process_traceback_frames(e.__traceback__) 120 # To get the full stack trace, call: 121 # `keras.config.disable_traceback_filtering()` --> 122 raise e.with_traceback(filtered_tb) from None 123 finally: 124 del filtered_tb Cell In[1], line 70, in r_loss(y_true, y_pred) 69 def r_loss(y_true, y_pred): ---> 70 return K.mean(K.square(y_true - y_pred), axis=[1,2,3]) ValueError: Dimensions must be equal, but are 225 and 228 for '{{node compile_loss/r_loss/sub}} = Sub[T=DT_FLOAT](data_1, functional_5_1/functional_3_1/activation_1/Sigmoid)' with input shapes: [8,225,300,3], [8,228,300,3].

退会済みユーザー

2024/03/31 13:46

コメントありがとうございます。質問を修正したのでご確認よろしくお願いいたします。

meg_

2024/03/31 16:01

> ValueError: Dimensions must be equal, but are 225 and 228 for '{{node compile_loss/r_loss/sub}} = Sub[T=DT_FLOAT](data_1, functional_23_1/functional_21_1/activation_3_1/Sigmoid)' with input shapes: [1,225,300,3], [1,228,300,3]. おそらく入力画像は縦横同サイズでないと駄目なのではないでしょうか？詳細はオートエンコーダの解説記事などを当たってください。

行動規範の内容に同意します

回答1件

ベストアンサー

エラーは r_loss(y_true, y_pred) で発生していて、y_true と y_pred の shape が不一致になっているのが原因です。Model を通したらサイズが変わっているということです。

その要因は奇数の入力サイズに対して、strides=2, padding='same' の Conv2D をやっていることです。Conv2D(..., strides=2, padding='same', ...) では奇数サイズ n の入力に対して出力が (n+1)/2 になります。一方 Conv2DTranspose(..., strides=1, padding='same', ...) では単純に 2倍になります。

今回は strides=2 を2回やっているので、エンコーダ部分では 225 → 113 → 57、デコーダ部分では 57 → 114 → 228 となります。(300だと 300 → 150 → 75 → 150 → 300 で問題ありません)
strides=2 を2回いれたいなら、4の倍数(1列除去して224とか)にしてやればいいのではないでしょうか。

投稿2024/04/01 05:10