赤外線画像で物体認識

やりたいこと

物体認識を赤外線画像で行いたいと思っています．

TensorFlow Object Detection API を利用しています．
Object Detection Demo を使って物体認識を行いたいと考えています．
デモは実行することができました．
また，適当に拾ってきたRGB画像についても問題なく動作しました．
しかし，赤外線画像を入力画像とした時に，以下のようなエラーが見られます．

ErrorCode
1ValueError                                Traceback (most recent call last)
2<ipython-input-64-b19082c2666b> in <module>()
3      3   # the array based representation of the image will be used later in order to prepare the
4      4   # result image with boxes and labels on it.
5----> 5   image_np = load_image_into_numpy_array(image)
6      6   # Expand dimensions since the model expects images to have shape: [1, None, None, 3]
7      7   image_np_expanded = np.expand_dims(image_np, axis=0)
8
9<ipython-input-61-af094dcdd84a> in load_image_into_numpy_array(image)
10      2   (im_width, im_height) = image.size
11      3   return np.array(image.getdata()).reshape(
12----> 4       (im_height, im_width, 3)).astype(np.uint8)
13
14ValueError: cannot reshape array of size 307200 into shape (480,640,3)
15

このエラー文の時には上の画像を使用しました．

また画像のデータセットは VOT-TIR2015 Dataset を用いています．

エラー文から画像の大きさに問題があるかと思い，いくつか変更したり，うまくいったときの画像サイズと同じサイズに変更したりもしました．
この画像の shape は (480, 640, 3) であり，RGB画像の時と同じように扱えると考えています．

やはり RGB画像とは違い depth の扱い方を工夫する必要があるのでしょうか？

Windows 10
protoc 3.4.0 win32

行動規範の内容に同意します

回答1件

ベストアンサー

reshape() は要素数自体は変更しないので、reshape 前の配列 np.array(image.getdata()) の要素数が 480 * 640 * 3 = 921600 でなければなりません。
一方元の画像はグレースケール画像なので、要素数が 480 * 640 = 307200 なのでエラーになっています。

グレースケール画像を forward する場合は、ネットワークを変更しないのであれば、画像のほうを RGB に変換しましょう。

python
1from PIL import Image
2
3img = Image.open('20c46dc5ee28b3968f42952f00762f0d.png')
4img = np.array(img)
5print(img.shape)  # (480, 640)
6
7img = Image.open('20c46dc5ee28b3968f42952f00762f0d.png')
8img = img.convert('RGB')  # グレースケールを RGB に変換
9img = np.array(img)
10print(img.shape)  # (480, 640, 3)