yolov3で学習済みモデルtrained_weights

前提・実現したいこと

https://rightcode.co.jp/blog/information-technology/learn-yolov3-image-windows10-object-detection
初めての機械学習で上記のサイトを参考に機械学習を行っているのですが、trained_weights_final.h5が生成されず、次のステップに進んことが出来ません。trained_weights_stage_1は生成されます。

発生している問題・エラーメッセージ

…
Epoch 48/50
1/1 [==============================] - 57s 57s/step - loss: 187.0126 - val_loss: 207.7613
Epoch 49/50
1/1 [==============================] - 57s 57s/step - loss: 190.6528 - val_loss: 200.3807
Epoch 50/50
1/1 [==============================] - 58s 58s/step - loss: 203.5468 - val_loss: 193.8124
Unfreeze all of the layers.
Train on 9 samples, val on 1 samples, with batch size 32.
Epoch 51/100
2021-06-21 03:48:18.164585: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:502] shape_optimizer failed: Invalid argument: Subshape must have computed start >= end since stride is negative, but is 0 and 2 (computed from start 0 and end 9223372036854775807 over shape with rank 2 and stride-1)
2021-06-21 03:48:18.499121: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:502] remapper failed: Invalid argument: Subshape must have computed start >= end since stride is negative, but is 0 and 2 (computed from start 0 and end 9223372036854775807 over shape with rank 2 and stride-1)
2021-06-21 03:48:21.918977: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:502] shape_optimizer failed: Invalid argument: Subshape must have computed start >= end since stride is negative, but is 0 and 2 (computed from start 0 and end 9223372036854775807 over shape with rank 2 and stride-1)
2021-06-21 03:48:22.234926: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:502] remapper failed: Invalid argument: Subshape must have computed start >= end since stride is negative, but is 0 and 2 (computed from start 0 and end 9223372036854775807 over shape with rank 2 and stride-1)
2021-06-21 03:49:00.514779: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at conv_ops.cc:486 : Resource exhausted: OOM when allocating tensor with shape[32,26,26,512] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cpu
Traceback (most recent call last):
  File "train.py", line 190, in <module>
    _main()
  File "train.py", line 84, in _main
    callbacks=[logging, checkpoint, reduce_lr, early_stopping])
  File "C:\Users\owner\anaconda3\envs\tf114\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\owner\anaconda3\envs\tf114\lib\site-packages\keras\engine\training.py", line 1418, in fit_generator
    initial_epoch=initial_epoch)
  File "C:\Users\owner\anaconda3\envs\tf114\lib\site-packages\keras\engine\training_generator.py", line 217, in fit_generator
    class_weight=class_weight)
  File "C:\Users\owner\anaconda3\envs\tf114\lib\site-packages\keras\engine\training.py", line 1217, in train_on_batch
    outputs = self.train_function(ins)
  File "C:\Users\owner\anaconda3\envs\tf114\lib\site-packages\keras\backend\tensorflow_backend.py", line 2715, in __call__
    return self._call(inputs)
  File "C:\Users\owner\anaconda3\envs\tf114\lib\site-packages\keras\backend\tensorflow_backend.py", line 2675, in _call
    fetched = self._callable_fn(*array_vals)
  File "C:\Users\owner\anaconda3\envs\tf114\lib\site-packages\tensorflow\python\client\session.py", line 1458, in __call__
    run_metadata_ptr)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[32,26,26,512] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cpu
         [[{{node conv2d_33/convolution}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

該当のソースコード

試したこと

ここに問題に対して試したことを記載してください。

補足情報（FW/ツールのバージョンなど）

GPU非搭載のため、CPUベースでしております。
ImageSetsでのMainではtest,train,valを6:2:10で分けております。

jbpb0

2021/06/20 22:40

これかな？ https://stackoverflow.com/questions/57558476/training-a-keras-model-yields-multiple-optimizer-errors

jbpb0

2021/06/20 22:46 編集

> ImageSetsでのMainではtest,train,valを6:2:10で分けてとは、サンプル数は具体的にそれぞれいくつでしょうか？ trainとvalは、下記ですか？ > Train on 9 samples, val on 1 samples, with batch size 32.

kerocyan

2021/06/21 00:37

返答ありがとうございます。 train,valをは6つ、２つ、10つで分けています。サイトをなぞって２種類の画像９枚づつ、計１８枚を用意して行っております

jbpb0

2021/06/21 01:15

> train,valをは6つ、２つ、10つで分けています。 testが無いのに、数字は三つ？機械学習ではたいてい、trainを一番多くします (その比率は、この質問のエラーとは関係ないかもしれませんが) 参考 https://teratail.com/questions/299301 https://starpentagon.net/analytics/dataset_split_evaluation/ https://tmitani-tky.hatenablog.com/entry/2018/12/19/005805

kerocyan

2021/06/21 03:39

無事解決できました！ありがとうございます。ちなみにtest,train,valの順で６つ、10つ、2つでした。脱字申し訳ございません。

jbpb0

2021/06/21 03:57

解決した方法は、 https://stackoverflow.com/questions/57558476/training-a-keras-model-yields-multiple-optimizer-errors の「Answer」に書かれてる修正ですか？

kerocyan

2021/06/21 04:11

はい、そちらを参考にさせていただきました。