機械学習時のGPUのOOMについて

付属コードを実行したいのですが,
以下のようなエラーが出ます

バッチサイズを減らしてみましたが, それでもこのエラーは出ます.
何か他に対処法はございますでしょうか

Python
1python fcn-12.3.1.py --train
22022-05-27 14:44:22.472707: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX AVX2
3To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
42022-05-27 14:44:22.780566: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 30720 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3060 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6
51 Physical GPUs, 1 Logical GPUs
62022-05-27 14:44:23.746268: I tensorflow/stream_executor/cuda/cuda_driver.cc:739] failed to allocate 30.00G (32212254720 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
7省略
8DNN library is not found.
9         [[{{node fcn/ResNet56v2/conv2d/Conv2D}}]] [Op:__inference_train_function_18251]
102022-05-27 15:02:19.731278: W tensorflow/core/kernels/data/generator_dataset_op.cc:107] Error occurred when finalizing GeneratorDataset iterator: FAILED_PRECONDITION: Python interpreter state is not initialized. The process may be terminated.
11
12

実行したいコードはこちらです.
https://github.com/PacktPublishing/Advanced-Deep-Learning-with-Keras/tree/master/chapter12-segmentation

環境について
環境
NVIDIA GEFORCE RTX 3060

CUDA 11.4
tensorflow-gpu 2.7.0

jbpb0

2022/05/27 11:31

> バッチサイズを減らしてみましたが, それでもこのエラーは出ます. バッチサイズを減らしたら、 failed to allocate 30.00G (32212254720 bytes) の「32212254720 bytes」の数値はどうなりますでしょうか？

行動規範の内容に同意します

回答1件

ベストアンサー

ご察しの通り恐らくモデルが大きすぎてOutOfMemoryです。

正攻法は書いてあるようにバッチを減らす、です。
これでだめだとパラメータを無理やり縮めるのが次善策になります。

やっていないので保証はできませんが、filter=256を128や64にすれば劇的にメモリが減るはずです。
検討ください。

ソースコードに以下の文面がありました。
RTXはもっとバケモノみたいなメモリを積んでいそうですが、設定したバッチサイズとGPUのメモリはいくつだったのでしょうか？

ResNet50 (v2) backbone.

Train with 6 layers of feature maps.
Pls adjust batch size depending on your GPU memory.
For 1060 with 6GB, --batch-size=1. For V100 with 32GB,
--batch-size=4

投稿2022/05/27 10:28

編集2022/05/27 10:47