tfliteによる検出結果の見方がわからない

今年に入ってから機械学習勉強開始した初心者です。

deeplabの学習データを使ってtflte作成、検出までをやろうとしています。
https://github.com/tensorflow/models/tree/master/research/deeplab

一応エラーが出ない状態で検出を実行できるようになったのですが、どの画像を投げても返ってくるのが0しか入っていない配列になります。

検出方法が間違っているのか？そもそもtfliteの作り方がおかしいのか切り分けしたく、検出のやり方がこれであっているのか、わかる方に教えてほしいです。

PASCAL VOCデータセットを使っています。

やったこと

deeplabのサンプルスクリプトでpbファイル作成（colabで）

tf/modelsのレポジトリをcloneしてきて local_test_mobilenetv2.shを使いました。
https://github.com/tensorflow/models/blob/master/research/deeplab/local_test_mobilenetv2.sh

local_test_mobilenetv2.shのmodelのエクスポート部分

python
1python "${WORK_DIR}"/export_model.py \
2  --logtostderr \
3  --checkpoint_path="${CKPT_PATH}" \
4  --export_path="${EXPORT_PATH}" \
5  --model_variant="mobilenet_v2" \
6  --num_classes=21 \
7  --crop_size=513 \
8  --crop_size=513 \
9  --inference_scales=1.0

tfliteに変換

整数量子化して軽量化するための設定とかあるようですが、まだそこまで理解できていないのでなしで。

shell
1!tflite_convert \
2  --graph_def_file=/content/models/research/deeplab/datasets/pascal_voc_seg/exp/train_on_trainval_set_mobilenetv2/export/frozen_inference_graph.pb \
3  --output_file=/content/models/research/deeplab/datasets/pascal_voc_seg/exp/train_on_trainval_set_mobilenetv2/export/frozen_inference_graph.tflite \
4  --output_format=TFLITE \
5  --input_shape=1,513,513,3 \
6  --input_arrays="MobilenetV2/MobilenetV2/input" \
7  --change_concat_input_ranges=true \
8  --output_arrays="ArgMax"

8MBぐらいになりました。

検出

python
1from PIL import Image
2import numpy
3import sys
4import cv2
5
6# tflite読み込み（tfliteファイルは移動してます）
7interpreter = tf.lite.Interpreter(model_path="/frozen_inference_graph.tflite")
8interpreter.allocate_tensors()
9
10# input output tensor取得
11input_details = interpreter.get_input_details()
12output_details = interpreter.get_output_details()
13
14# 入出力フォーマットを確認
15print('入出力フォーマットを確認')
16print(input_details)
17print(output_details)
18
19# 入力のshape取得
20input_shape = input_details[0]['shape']
21print('shape確認')
22print(input_shape)
23
24# テスト画像
25test_img = "/content/deeplab_sample/bicycle513x513.jpg"
26image = Image.open(test_img)
27image = image.convert("RGB")
28image = image.resize((513, 513))
29img_data = np.asarray(image, dtype=np.uint8)
30
31# 画像shape変換
32reshaped_img = img_data.reshape(input_shape)
33print('入力データ')
34print(reshaped_img)
35interpreter.set_tensor(input_details[0]['index'], reshaped_img)
36
37# 実行
38interpreter.invoke()
39output_data = interpreter.get_tensor(output_details[0]['index'])
40print('出力データ')
41print(output_data)
42print(np.count_nonzero(output_data))

出力

入出力フォーマットを確認
[{'name': 'MobilenetV2/MobilenetV2/input', 'index': 6, 'shape': array([  1, 513, 513,   3], dtype=int32), 'shape_signature': array([  1, 513, 513,   3], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}]
[{'name': 'ArgMax', 'index': 0, 'shape': array([  1, 513, 513], dtype=int32), 'shape_signature': array([  1, 513, 513], dtype=int32), 'dtype': <class 'numpy.int64'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}]
shape確認
[  1 513 513   3]
入力データ
[[[[239. 247. 250.]
   [239. 247. 250.]
   [238. 246. 249.]
   ...
   [244. 247. 252.]
   [244. 247. 252.]
   [244. 248. 251.]]

  [[239. 247. 250.]
   [238. 246. 249.]
   [237. 245. 248.]
   ...
   [244. 247. 252.]
   [244. 247. 252.]
   [244. 248. 251.]]

  [[238. 245. 251.]
   [237. 244. 250.]
   [236. 243. 249.]
   ...
   [244. 248. 251.]
   [244. 248. 251.]
   [244. 248. 251.]]

  ...

  [[112. 120.  45.]
   [102. 111.  44.]
   [105. 112.  58.]
   ...
   [127. 113.  64.]
   [115. 101.  52.]
   [108.  88.  37.]]

  [[109. 126.   0.]
   [111. 128.  16.]
   [105. 118.  26.]
   ...
   [135. 121.  72.]
   [132. 118.  69.]
   [132. 112.  61.]]

  [[128. 135.  65.]
   [104. 111.  43.]
   [ 94. 100.  36.]
   ...
   [145. 124.  71.]
   [152. 131.  78.]
   [143. 126.  72.]]]]
出力データ
[[[0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  ...
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]]]
0

試したこと

違う画像を使う

補足情報

本来は mask r-cnn で学習を行っていて、そっちで作成した学習データを使いたいのですが、h5ファイルからtfliteに変換するための情報がうまく見つけらないのと、変換時にわたすパラメータの理解がまだ悪く一旦あきらめました。でとりあえずdeeplabのサンプルを使ってtflite変換して検出する部分をやってみているところです。

https://github.com/matterport/Mask_RCNN

行動規範の内容に同意します

回答2件

質問はクローズしていますが追加情報の共有です。
Mask R-CNNのTensorflow Liteコンバートに成功し、正常に動作することまで確認してベンチマークしました。スクリプトとモデルを更新しましたのでよろしければお使い下さい。
Mask R-CNNの変換用最新スクリプト群と変換済モデル

やはりモデルの構造がとても複雑ですので動作はかなり遅いです。なお、Tensorflow v2.2.0以降を導入しないと正常に変換できません。

console
1$ pip3 install tf-nightly

として最新のパッケージを導入して下さい。また、一部の後処理のOPがTensorflow Liteに対応していませんので、変換オプションに特殊なパラメータを指定する必要があります。

Mask R-CNNの変換スクリプト本体

python
1converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS,tf.lite.OpsSet.SELECT_TF_OPS]

投稿2020/04/08 22:40

PINTO

総合スコア351

thinkdice

2020/04/09 02:16

情報ありがとうございます！ deeplabのほうが一段落したらトライしてみます。そういえば、自分がウォッチしていたissueにも成功したという人がコメントしてました。その人がシェアしてたコードです。ご存知かもしれませんが一応。 https://gist.github.com/bmabir17/754a6e0450ec4fd5e25e462af949cde6

PINTO

2020/04/09 03:16

共有いただきありがとうございます。実はそのissueを存じ上げておりまして、2日前に実際に動かしてみました。が、環境がかなりレガシーなのものを想定されているのと、モデルのサイズが250MBほどあるものを前提にしていましたので、私はいったん後回しにいたしました。わりとハマりどころが多いと思います。時間が出来たら後日リトライしてみます。

行動規範の内容に同意します

ベストアンサー

試してみました。お使いのモデルと同じかどうかは分かりませんが、同じ手順を踏んでみました。結果的にはゼロ以外の値が正常に取得できているようです。トレーニング済みのモデルファイルを変更してみてはいかがでしょうか。
https://github.com/tensorflow/models/blob/master/research/deeplab/g3doc/model_zoo.md

console
1$ wget http://download.tensorflow.org/models/deeplabv3_mnv2_pascal_trainval_2018_01_29.tar.gz
2$ tar -zxvf deeplabv3_mnv2_pascal_trainval_2018_01_29.tar.gz
3
4#export_model.py を編集
5#input_image = tf.placeholder(tf.uint8, [1, None, None, 3], name=_INPUT_NAME)
6input_image = tf.placeholder(tf.float32, [1, FLAGS.crop_size[0], FLAGS.crop_size[1], 3], name=_INPUT_NAME)
7
8$ python3 export_model.py \
9  --logtostderr \
10  --checkpoint_path=deeplabv3_mnv2_pascal_trainval/model.ckpt-30000 \
11  --export_path=export/deeplabv3_mnv2_pascal_trainval.pb \
12  --model_variant=mobilenet_v2 \
13  --num_classes=21 \
14  --crop_size=257 \
15  --crop_size=257 \
16  --inference_scales=1.0
17
18$ tflite_convert \
19  --graph_def_file=export/deeplabv3_mnv2_pascal_trainval.pb \
20  --output_file=export/deeplabv3_mnv2_pascal_trainval.tflite \
21  --output_format=TFLITE \
22  --input_shape=1,257,257,3 \
23  --input_arrays=ImageTensor \
24  --change_concat_input_ranges=true \
25  --output_arrays=SemanticPredictions

python
1from PIL import Image
2import numpy as np
3import sys
4import cv2
5import tensorflow as tf
6
7# tflite読み込み（tfliteファイルは移動してます）
8interpreter = tf.lite.Interpreter(model_path="deeplabv3_mnv2_pascal_trainval.tflite")
9interpreter.allocate_tensors()
10
11# input output tensor取得
12input_details = interpreter.get_input_details()
13output_details = interpreter.get_output_details()
14
15# 入出力フォーマットを確認
16print('入出力フォーマットを確認')
17print(input_details)
18print(output_details)
19
20# 入力のshape取得
21input_shape = input_details[0]['shape']
22print('shape確認')
23print(input_shape)
24
25# テスト画像
26test_img = "individualImage.png"
27image = Image.open(test_img)
28image = image.convert("RGB")
29image = image.resize((257, 257))
30img_data = np.array(image, dtype='f4')
31
32# 画像shape変換
33reshaped_img = img_data.reshape(input_shape)
34print('入力データ')
35print(reshaped_img)
36interpreter.set_tensor(input_details[0]['index'], reshaped_img)
37
38# 実行
39interpreter.invoke()
40output_data = interpreter.get_tensor(output_details[0]['index'])
41print('出力データ')
42print(output_data)
43print(np.count_nonzero(output_data))

console
1$ python3 test.py 
2入出力フォーマットを確認
3[{'name': 'ImageTensor', 'index': 9, 'shape': array([  1, 257, 257,   3], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0)}]
4[{'name': 'SemanticPredictions', 'index': 272, 'shape': array([  1, 257, 257], dtype=int32), 'dtype': <class 'numpy.int32'>, 'quantization': (0.0, 0)}]
5shape確認
6[  1 257 257   3]
7入力データ
8[[[[ 13.  17.  16.]
9   [ 21.  25.  23.]
10   [ 13.  18.  14.]
11   ...
12   [ 31.  33.  20.]
13   [ 30.  32.  19.]
14   [ 31.  33.  20.]]
15
16  [[ 14.  17.  17.]
17   [ 20.  25.  23.]
18   [ 17.  22.  18.]
19   ...
20   [ 31.  33.  20.]
21   [ 30.  31.  18.]
22   [ 31.  33.  20.]]
23
24  [[ 12.  15.  14.]
25   [ 18.  22.  19.]
26   [ 17.  22.  19.]
27   ...
28   [ 31.  33.  20.]
29   [ 30.  33.  20.]
30   [ 29.  32.  19.]]
31
32  ...
33
34  [[ 23.  27.  26.]
35   [ 23.  27.  26.]
36   [ 25.  29.  28.]
37   ...
38   [ 96.  67.  43.]
39   [ 90.  63.  39.]
40   [ 83.  58.  36.]]
41
42  [[ 23.  25.  26.]
43   [ 26.  28.  29.]
44   [ 26.  28.  29.]
45   ...
46   [103.  73.  49.]
47   [ 91.  63.  41.]
48   [ 84.  58.  37.]]
49
50  [[ 25.  25.  28.]
51   [ 26.  26.  28.]
52   [ 27.  27.  29.]
53   ...
54   [104.  75.  52.]
55   [ 93.  65.  44.]
56   [ 87.  60.  41.]]]]
57出力データ
58[[[ 0  0  0 ...  0  0  0]
59  [ 0  0  0 ...  0  0  0]
60  [ 0  0  0 ...  0  0  0]
61  ...
62  [15 15 15 ... 15 15 15]
63  [15 15 15 ... 15 15 15]
64  [15 15 15 ... 15 15 15]]]
6540524

サンプルのGoogleColaboratory
checkpointや.pbや.tfliteのモデル構造はコチラの NETRON で確認可能です。
また、コチラ deeplab cityscape edgetpu #3 で海外エンジニアに量子化の手順を説明しています。ご参考になれば幸いです。コチラの私のGithubリポジトリ PINTO0309/PINTO_model_zoo には変換スクリプトのサンプルと変換前後のモデルをコミットしてあります。ご参考にどうぞ。

ちなみに、mask r-cnnはTensorflow Liteが公式には変換に対応していません。小細工をすれば変換はできますが難易度が高いためお勧めしません。
mask r-cnnの変換スクリプト群 https://github.com/PINTO0309/PINTO_model_zoo/tree/master/08_mask_rcnn_inceptionv2

お役に立てれば幸いです。

投稿2020/03/31 15:49

編集2020/04/03 15:14

PINTO

総合スコア351

thinkdice

2020/04/01 06:54

PINTOさんありがとうございます。mask r-cnnからのtflite変換について調べまくってた時にレポジトリを拝見してました。有用な情報を公開していただいてありがとうございます。モデル変更して今一度やってみます。

PINTO

2020/04/01 13:57

私も、そしてGoogleの中の人も同じ見解ですが、mask r-cnnはとてもヘビー過ぎてtfliteで実行するには不向きなモデルです。Deeplabを検証されている時点で、おそらく解にはたどり着かれているのだとは思いますが、軽量なセグメンテーションとMobileNetを組み合わせるのが現実的です。頑張って下さい。

thinkdice

2020/04/03 09:42 編集

いただいたコードそのままでやってみました。 https://colab.research.google.com/drive/141w_AHytH2KTid9jrahH5vginCCoCU6J 結果的には、検出のところで以下のRntimeErrorが出て成功までは行ってない状態です。colab環境だとまたやり方が違ったりするのでしょうか？ RuntimeError: tensorflow/lite/kernels/depthwise_conv.cc:108 params->depth_multiplier * SizeOfDimension(input, 3) != SizeOfDimension(filter, 3) (0 != 32)Node number 17 (DEPTHWISE_CONV_2D) failed to prepare. あと、tflite_convert 時にエラーが出て正しくtflite変換されているのかも確信が持てていません。ちなみエラーは以下のフラグ追加で消えました。 --inference_input_type=QUANTIZED_UINT8 \ --inference_type=FLOAT \ --std_dev_values=128 \ --mean_values=128 \ エラー内容 F tensorflow/lite/toco/tooling_util.cc:2277] Check failed: array.data_type == array.final_data_type Array "ImageTensor" has mis-matching actual and final data types (data_type=uint8, final_data_type=float). Fatal Python error: Aborted

thinkdice

2020/04/03 07:26

ここまでモデルの中身など見ないようにして勉強してきましたが、そろそろ限界を感じています。 input_arrays, inference_input_type, inference_type などのフラグに何を渡すかを自分で調べられるようになるには、どのあたりを勉強していけばいいのでしょうか？（連投すみません）

PINTO

2020/04/03 15:12 編集

こちらこそすみません。 export_model.py を１行書き換える必要がありました。 92行目と93行目をご覧ください。 https://colab.research.google.com/drive/11I_tKY9CaPGYhHkAq73cSFMfTtFb05RN #input_image = tf.placeholder(tf.uint8, [1, None, None, 3], name=_INPUT_NAME) input_image = tf.placeholder(tf.float32, [1, FLAGS.crop_size[0], FLAGS.crop_size[1], 3], name=_INPUT_NAME) Tensorflow LiteのコンバーターはデフォルトオプションではINPUTとOUTPUTの型を一致させる必要があります。ダウンロードしたモデルのINPUTの型は UINT8 であり、FLOAT32に変換する必要があります。また、CROPのサイズを257x257のサンプルで提示してしまいましたが、 --crop_size=257 のパラメータ指定で動的に変更可能です。モデルの構造を見る場合は、INPUTとOUTPUTのオペレーションの内容だけ調べれば良いです。恥ずかしながら私もモデルの中身はあまり見ていませんw 回答の内容は補正しておきました。指定可能なパラメータの説明は公式ページの下記が参考になります。 https://www.tensorflow.org/lite/convert/cmdline_reference

thinkdice

2020/04/06 06:47 編集

export_model.py修正バージョンでexport → tflite変換 → 検出できました。過去の学習で作成したckptファイルでもやってみましたが0以外が返ってきてるので検出できているようです。これでやっと自分のデータを試すところまで辿り着けました。ありがとうございます！ INPUTとOUTPUTのオペレーションあたりも自分で調べられるように勉強してみます！

thinkdice

2020/04/28 01:58

時間空きましたが、ひとつ質問させてください。最終の実行環境で扱う画像サイズを256x256にすることになって、今PINTOさんのコードを参考に修正しているところなのですが、256ではなく257という半端な数字になっているのは何か理由がありますか？

PINTO

2020/04/28 03:25

私の理解では、8の倍数 + 1 あるいは 16の倍数 + 1 (”倍数”の部分は output_strideというもののようです) のトレーニングプロトコルの制約により、最後にプラス１をするようになっています。下記にその意味合いの説明のやりとりがありますのでご参考までにどうぞ。トレーニング用のスクリプトを見ましたが、上記の計算式になっていることも確認しました。 https://github.com/tensorflow/models/issues/3886

thinkdice

2020/04/28 04:46

なるほど、端のピクセルを処理するための余白みたいな感じですかね。ありがとうございます！

PINTO

2020/05/02 07:59

DeeplabV3-plus ですが、 256x256 の解像度だと割と苦しい精度になりましたよ。。。 513x513 あたりをおすすめします。 https://github.com/PINTO0309/PINTO_model_zoo/blob/master/README.md#sample4---semantic-segmentation-deeplabv3-plus-256x256

thinkdice

2020/05/15 02:47 編集

すみません、コメント今気づきました。ありがとうございます！まさしく今やりたい内容で、おっしゃるとおり今257x257でtflite検出で精度が出なくて苦しんでるところでした。。。 step数30Kくらいまで増やすとvis,pyの検出結果や境界線の精度が上がるんですが、それをtfliteまで持っていくと逆に精度が大きく下がる状態です。特にstep数3K以上にするとtfliteでは大きく落ちます。訓練時は513x513でやってexport_model.pyで257にサイズ調整しています。quantizeはなしです。 ckpt→tfliteに変換する過程で何か間違っているのではないかと思い、調べてみましたがそれっぽいパラメータも見つけらていません。どこか見落としてるのでしょうか？

PINTO

2020/05/15 03:42 編集

まず、推論できているので手順に誤りは無いと思います。しかしあえて基本的なことを言わせて頂くとすると、量子化をする時点で精度が一定量劣化する前提です。その代わりにパフォーマンスは大きく上がる仕組みです。トレーニング時に32bitの情報量で精度を担保していたものを4分の1の情報量の8bitに減らすわけですからなんとなく伝わると思います。また、精度の観点では入力解像度が大きく影響します。こちらも縦横それぞれ２分の１、合計で4分の1の情報量に減らしてしまっていますから量子化をする時点で画像からとれる特徴情報が著しく落ちますので大きく精度が落ち込みます。量子化をしてもしなくても、入力解像度をもう少し大きめの8の倍数で調整する必要があると思います。パフォーマンスと精度はトレードオフです。公式のモデルは自信を持って公開できる精度を担保したいがゆえに、513x513という大きめの解像度になっていると考えられます。

thinkdice

2020/05/18 01:46

ありがとうございます！なるほど。基本的なこととして画像サイズを下げると精度は下がるということなんですね。想像以上に下がりました。セマンティクスセグメンテーションをモバイルでやるのはまだ制約がありますね。。。

行動規範の内容に同意します

あなたの回答