CUDAのエラーについて

前提・実現したいこと

python3.8
cuda10.1
cuDNN7.6
tensorflow-gpu 2.3を使用しています

申し訳ございませんがエラーがでている意味もあまりわかっていない状況です。自分なりにメモリを制限するコードなどを試して見ましたが効果はありませんでした。
お手数をおかけしますが宜しくおねがい致します。

コードとエラーメッセージになります

from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.applications.vgg16 import VGG16
from tensorflow.keras.preprocessing.image import load_img
from tensorflow.keras.preprocessing.image import img_to_array
from tensorflow.keras.applications.vgg16 import preprocess_input
from tensorflow.keras.models import Model
from tensorflow.keras.models import load_model
#tokenizerをloadする
tokenizer = load(open('tokenizer1.pkl', 'rb'))
#trainigしたときのpre-defineでmax sequence lengthの値
max_length = 43
#modelのload
model = load_model('model-ep004-loss2.778-val_loss3.210.h5')
#text_readingに使われる画像のpath
text_reading_image_path ='image2.png'
class Image_captioning: #説明文を生成するクラス     
    # extract features from each photo in the directory
    def extract_features(self,filename):
        print('a')
        # load the model
        model = VGG16()
        print('b')
        # re-structure the model
        model.layers.pop()
        print('c')
        model = Model(inputs=model.inputs, outputs=model.layers[-1].output)
        print('d')
        # load the photo
        image = load_img(filename, target_size=(224, 224))
        print('e')
        # convert the image pixels to a numpy array
        image = img_to_array(image)
        print('f')
        # reshape data for the model
        image = image.reshape((1, image.shape[0], image.shape[1], image.shape[2]))
        print('g')
        # prepare the image for the VGG model
        image = preprocess_input(image)
        print("h")
        # get features
        feature = model.predict(image, verbose=0)
        print('i')
        return feature
    # map an integer to a word
    def word_for_id(self,integer, tokenizer):
        for word, index in tokenizer.word_index.items():
            if index == integer:
                return word
        return None

    # generate a description for an image
    def generate_desc(self,model, tokenizer, photo, max_length):
        # seed the generation process
        in_text = 'startseq'
        # iterate over the whole length of the sequence
        for i in range(max_length):
            # integer encode input sequence
            sequence = tokenizer.texts_to_sequences([in_text])[0]
            # pad input
            sequence = pad_sequences([sequence], maxlen=max_length)
            # predict next word
            yhat = model.predict([photo,sequence], verbose=0)
            # convert probability to integer
            yhat = argmax(yhat)
            # map integer to word
            word = self.word_for_id(yhat, tokenizer)
            # stop if we cannot map the word
            if word is None:
                break
            # append as input for generating the next word
            in_text += ' ' + word
            # stop if we predict the end of the sequence
            if word == 'endseq':
                break
        return in_text
    def text_reading(self):
        photo = self.extract_features(text_reading_image_path)
        d = self.generate_desc(model, tokenizer, photo, max_length)
        description = d.replace('startseq',' ',1).replace('endseq',' ',1)
        print(description)
        sound = gTTS(text=description,lang='ja',slow=False)
        sound.save('/home/limlab/program/navigation/potential/voice/navigation.mp3')
        playsound('/home/limlab/program/voice/1.wav')
        playsound("/home/limlab/program/navigation/potential/voice/navigation.mp3") 
#ここからはエラーメッセージとなります
a
2021-11-21 11:58:17.479209: I tensorflow/stream_executor/cuda/cuda_driver.cc:775] failed to allocate 1.00G (1073741824 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-11-21 11:58:17.479540: I tensorflow/stream_executor/cuda/cuda_driver.cc:775] failed to allocate 921.60M (966367744 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-11-21 11:58:17.479817: I tensorflow/stream_executor/cuda/cuda_driver.cc:775] failed to allocate 829.44M (869731072 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-11-21 11:58:17.480066: I tensorflow/stream_executor/cuda/cuda_driver.cc:775] failed to allocate 746.50M (782758144 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-11-21 11:58:17.480313: I tensorflow/stream_executor/cuda/cuda_driver.cc:775] failed to allocate 671.85M (704482304 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-11-21 11:58:17.480559: I tensorflow/stream_executor/cuda/cuda_driver.cc:775] failed to allocate 604.66M (634034176 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-11-21 11:58:18.318427: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 411041792 exceeds 10% of free system memory.
b
c
d
e
f
g
h
2021-11-21 11:58:18.680658: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2021-11-21 11:58:18.779763: E tensorflow/stream_executor/cuda/cuda_blas.cc:225] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2021-11-21 11:58:18.781533: E tensorflow/stream_executor/cuda/cuda_blas.cc:225] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2021-11-21 11:58:18.782176: E tensorflow/stream_executor/cuda/cuda_blas.cc:225] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2021-11-21 11:58:18.782845: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2021-11-21 11:58:18.785736: E tensorflow/stream_executor/cuda/cuda_dnn.cc:328] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2021-11-21 11:58:18.786488: E tensorflow/stream_executor/cuda/cuda_dnn.cc:328] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2021-11-21 11:58:18.786504: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at conv_ops_fused_impl.h:642 : Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.

試したこと

下記のコードを挿入し改善しようとしました。

from tensorflow.compat.v1 import ConfigProto
from tensorflow.compat.v1 import InteractiveSession
config = ConfigProto()
config.gpu_options.allow_growth = True
session = InteractiveSession(config=config)

os.environ['CUDA_LAUNCH_BLOCKING'] = "1"
torch.rand(1).cuda()

jbpb0

2021/11/19 13:27

バッチサイズを思いっきり減らしてみるとか

退会済みユーザー

2021/11/20 02:37

ご返信ありがとうございます。 model = VGG16()ここの呼び出しでエラー？してると思うのですがバッチサイズや画像のサイズなどはどこで編集すればよろしいのでしょうか？vgg16.pyのファイルでしょうか？申し訳ございません。お手数をお掛けしますが宜しくお願い致します。

jbpb0

2021/11/20 08:04

> model = VGG16()ここの呼び出しでエラー？その時点ではまだ、たいしてメモリーを使ってないと思うのですが、上記の行を実行しただけで、質問に掲載されてるエラーが出るのでしょうか？実行してるコードのファイルをコピーして、コピーしたファイルでは上記の行よりも後を全部削除して、そのファイルを実行しても、そのエラーが出るのでしょうか？ > バッチサイズや画像のサイズなどはどこで編集すればよろしいのでしょうか？vgg16.pyのファイルでしょうか？コードの書き方次第ですが、普通は「vgg16.py」ではなく、それを呼び出す側(「model = VGG16()」とかやってる)のコード中で指定しますどう指定するのかは現状のコードの書き方によるので、現状のコードを開示してくれないと、それ以上は分かりません

退会済みユーザー

2021/11/21 03:23

ご丁寧な返信ありがとうございます。また、拙い質問で申し訳ございません。先程、編集致しましたがターミナルに「a」が出力された後、一度エラーがでているのでそちらでエラーがまず出ているかなと思いました。また、「i」に行かず「 feature = model.predict(image, verbose=0)」で強制終了してしまっています。

jbpb0

2021/11/21 10:20 編集

モジュールをインポートしてるところも書いてください「VGG16」はkerasのか、あるいはtensorflow.kerasのですか？

退会済みユーザー

2021/11/21 15:43

tensorflow.kerasを使用しています

jbpb0

2021/11/22 01:28 編集

Pythonで下記「だけ」実行しても、そのエラーが出ますか？ from tensorflow.keras.applications.vgg16 import VGG16 model = VGG16() もし上記「だけ」でもエラーが出るなら、使い物にならないので、環境構築からやり直すことをお勧めします

退会済みユーザー

2021/11/22 02:35

実行したところ、エラーは出ていませんでした。今回は物体検出をしたあとに説明文生成する関数に来る形となっていますが（説明文生成する関数のみ掲載してます）説明文生成する関数のみの実行であれば下記のエラーがでますが実行はできるようです。 2021-11-22 11:21:32.434819: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10 2021-11-22 11:21:32.542277: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7 2021-11-22 11:21:33.214484: W tensorflow/core/common_runtime/bfc_allocator.cc:246] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.06GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2021-11-22 11:21:33.260858: W tensorflow/core/common_runtime/bfc_allocator.cc:312] Garbage collection: deallocate free memory regions (i.e., allocations) so that we can re-allocate a larger region to avoid OOM due to memory fragmentation. If you see this message frequently, you are running near the threshold of the available device memory and re-allocation may incur great performance overhead. You may try smaller batch sizes to observe the performance impact. Set TF_ENABLE_GPU_GARBAGE_COLLECTION=false if you'd like to disable this feature. 2021-11-22 11:21:33.294684: I tensorflow/stream_executor/cuda/cuda_driver.cc:775] failed to allocate 1.85G (1981808640 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2021-11-22 11:21:33.402043: W tensorflow/core/common_runtime/bfc_allocator.cc:246] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.06GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2021-11-22 11:21:33.486932: W tensorflow/core/common_runtime/bfc_allocator.cc:246] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.09GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2021-11-22 11:21:33.565611: W tensorflow/core/common_runtime/bfc_allocator.cc:246] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.15GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 男性がスケートボードをしている

jbpb0

2021/11/22 05:15 編集

> 説明文生成する関数のみの実行であれば下記のエラーがでますが「2021-11-22 11:21:32.434819: I」とかの「I」と、「2021-11-22 11:21:33.214484: W」とかの「W」は、エラーではありません「2021-11-21 11:58:18.779763: E」とかの「E」がエラーです「I」や「W」がエラーに関連する場合もありますが、説明文生成する関数のみなら実行可能であることから、今回のケースでは、その場合に表示されてる「I」と「W」はエラーではありません「説明文生成する関数」のみならエラーが出ずに実行可能なので、質問のエラーの原因は > 物体検出をしたあとに説明文生成する関数という構成になってることに起因してるので、質問に記載されてる「説明文生成する関数」のコードだけ調べても、原因も対策も分からないと思います

jbpb0

2021/11/22 06:12 編集

nvidia-smiコマンド https://dev.classmethod.jp/articles/monitor-nvidia-gpu-usage-with-nvidia-smi-nvsmi/ を使って、下記のそれぞれの状態でGPUのメモリーの使用量と残り量がどれくらいかを確認してみてください・Pythonコード実行前・「物体検出」のコードのみの実行中・「説明文生成する関数」のみの実行中また、「物体検出」のコードの先頭に import os os.environ['TF_FORCE_GPU_ALLOW_GROWTH'] = 'true' を追加して実行した場合に、実行中のGPUのメモリーの使用量が減らないか、も確認してみてくださいもしそれでメモリー使用量が減るなら、「物体検出」と「説明文生成する関数」を連続して実行できるかもしれません

退会済みユーザー

2021/11/22 06:58

nvidia-smi -lms 500こちらのコマンドを使い0.5秒ごとに監視してみました。物体検出の方では80％近く使っていて、説明文生成では45％近く使っていると思います。また、os.environ['TF_FORCE_GPU_ALLOW_GROWTH'] = 'true'こちらを使った結果そこまで値が変わらなかったため連続して実行は厳しそうでした。現在はGeforce gtx1650で4gbですが８gb以上の環境にしたほうが良さそうでしょうか。

jbpb0

2021/11/22 07:10

> 物体検出の方では80％近く使っていて、説明文生成では45％近く使っている「物体検出」と「説明文生成」は、全く別のネットワークでしょうから、連続で動かしたければ、両方のメモリー使用量の合計が一度に確保できないとダメだと思います tensorflowが「物体検出」用に確保したメモリーを、「物体検出」が終わったら解放させられたらいいのですが、その方法は知られてません「物体検出」の結果を「説明文生成」で使うのでしょうか？使わないなら、「物体検出」が終わった後に一旦Pythonを終了させて、Pythonを再度起動してから「説明文生成」を行えば大丈夫です「物体検出」の結果を「説明文生成」で使う場合は、「物体検出」の結果をファイルに保存してから、Pythonを終了・再度起動して、「物体検出」の結果をファイルから読み込んでから「説明文生成」を行う、という手があります

jbpb0

2021/11/22 07:26

質問のコードは学習をしないで推論(predict)のみですが、「物体検出」と「説明文生成」は学習は別個に行って、連続で実行するのは推論の場合だけでしょうか？もしそうなら、GPUを使わずにCPUのみでの推論で実行時間が満足できるのなら、そうするという手もあります一般的にはCPUのメモリーの方が増やしやすく、メモリー不足を解消しやすいので (パソコンのハード構成次第ですが)

退会済みユーザー

2021/11/22 12:56

> 物体検出」と「説明文生成」は学習は別個に行って、連続で実行するのは推論の場合だけでしょうか？追加していなく申し訳ございません。 model = load_model('/home/limlab/program/data/model-ep004-loss2.778-val_loss3.210.h5') 上記のものを読み込んでいますリアルタイムで物体検出をし、modelを使い説明文を生成しています

jbpb0

2021/11/24 02:30

> リアルタイムで物体検出をし、modelを使い説明文を生成していますなら、Pythonの終了・再度起動を繰り返すのは時間がかかりすぎるので、無理ですね CPUのみでの「物体検出」と「説明文生成」の連続推論では間に合わないなら、 > Geforce gtx1650で4gbですが８gb以上の環境にしたほうが良さそうでしょうか。ですね

行動規範の内容に同意します

回答1件

ベストアンサー

nvidia-smiでNVIDIA GPU使用状況をモニタリングする
とかを参考にして、「nvidia-smi」コマンドを使って、下記のそれぞれの状態でGPUのメモリーの使用量と残り量がどれくらいかを確認してみてください
・Pythonコード実行前
・「物体検出」のコードのみの実行中
・「説明文生成する関数」のみの実行中

「物体検出」と「説明文生成」は、全く別のネットワークでしょうから、連続で動かしたければ、両方のメモリー使用量の合計が一度に確保できないとダメだと思います

投稿2021/11/24 02:33

jbpb0

総合スコア7658