StyleGAN2をGoogle Colaboratoryで実装したい

前提・実現したいこと

Google ColaboratoryでStyleGAN2を実装しようとしています。(初学者です)
imagenet を用いて実装し、画像Aから画像Bへのトランジションを行いたいです。

主に以下のサイトを参考にして進めています。
・https://techpr.info/ml/stylegan2-colab/
・https://github.com/justinpinkney/awesome-pretrained-stylegan2

発生している問題・エラーメッセージ

先週まで実行できていたのに急に "RuntimeError: No GPU devices found" が出てGPUが使えなくなりました。
先週からコードは何も触っていません。

python
1--2022-10-12 01:49:51--  https://battle.shawwn.com/sdc/stylegan2-imagenet-512/model.ckpt-533504.pkl
2Resolving battle.shawwn.com (battle.shawwn.com)... 104.21.82.78, 172.67.155.55, 2606:4700:3036::6815:524e, ...
3Connecting to battle.shawwn.com (battle.shawwn.com)|104.21.82.78|:443... connected.
4HTTP request sent, awaiting response... 200 OK
5Length: 364035409 (347M) [application/octet-stream]
6Saving to: ‘model.ckpt-533504.pkl.1’
7
8model.ckpt-533504.p 100%[===================>] 347.17M  26.9MB/s    in 14s     
9
102022-10-12 01:50:06 (24.6 MB/s) - ‘model.ckpt-533504.pkl.1’ saved [364035409/364035409]
11
12Local submit - run_dir: results/00001-generate-images
13dnnlib: Running run_generator.generate_images() on localhost...
14Loading networks from "model.ckpt-533504.pkl"...
152022-10-12 01:50:08.183797: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
162022-10-12 01:50:08.183918: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
172022-10-12 01:50:08.184002: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
182022-10-12 01:50:08.184080: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
192022-10-12 01:50:08.184159: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
202022-10-12 01:50:08.184377: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
212022-10-12 01:50:08.184527: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudnn.so.7'; dlerror: libcudnn.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
222022-10-12 01:50:08.184546: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1662] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
23Skipping registering GPU devices...
24Setting up TensorFlow plugin "fused_bias_act.cu": Preprocessing... 2022-10-12 01:50:09.054726: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
252022-10-12 01:50:09.054855: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
262022-10-12 01:50:09.054932: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
272022-10-12 01:50:09.055003: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
282022-10-12 01:50:09.055075: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
292022-10-12 01:50:09.055152: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
302022-10-12 01:50:09.055228: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudnn.so.7'; dlerror: libcudnn.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
312022-10-12 01:50:09.055246: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1662] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
32Skipping registering GPU devices...
33Failed!
34Traceback (most recent call last):
35  File "/content/stylegan2/run_generator.py", line 168, in <module>
36    main()
37  File "/content/stylegan2/run_generator.py", line 163, in main
38    dnnlib.submit_run(sc, func_name_map[subcmd], **kwargs)
39  File "/content/stylegan2/dnnlib/submission/submit.py", line 343, in submit_run
40    return farm.submit(submit_config, host_run_dir)
41  File "/content/stylegan2/dnnlib/submission/internal/local.py", line 22, in submit
42    return run_wrapper(submit_config)
43  File "/content/stylegan2/dnnlib/submission/submit.py", line 280, in run_wrapper
44    run_func_obj(**submit_config.run_func_kwargs)
45  File "/content/stylegan2/run_generator.py", line 21, in generate_images
46    _G, _D, Gs = pretrained_networks.load_networks(network_pkl)
47  File "/content/stylegan2/pretrained_networks.py", line 76, in load_networks
48    G, D, Gs = pickle.load(stream, encoding='latin1')
49  File "/content/stylegan2/dnnlib/tflib/network.py", line 297, in __setstate__
50    self._init_graph()
51  File "/content/stylegan2/dnnlib/tflib/network.py", line 154, in _init_graph
52    out_expr = self._build_func(*self.input_templates, **build_kwargs)
53  File "<string>", line 495, in G_synthesis_stylegan2
54  File "<string>", line 459, in layer
55  File "<string>", line 103, in modulated_conv2d_layer
56  File "<string>", line 72, in apply_bias_act
57  File "/content/stylegan2/dnnlib/tflib/ops/fused_bias_act.py", line 68, in fused_bias_act
58    return impl_dict[impl](x=x, b=b, axis=axis, act=act, alpha=alpha, gain=gain)
59  File "/content/stylegan2/dnnlib/tflib/ops/fused_bias_act.py", line 122, in _fused_bias_act_cuda
60    cuda_kernel = _get_plugin().fused_bias_act
61  File "/content/stylegan2/dnnlib/tflib/ops/fused_bias_act.py", line 16, in _get_plugin
62    return custom_ops.get_plugin(os.path.splitext(__file__)[0] + '.cu')
63  File "/content/stylegan2/dnnlib/tflib/custom_ops.py", line 130, in get_plugin
64    compile_opts += ' --gpu-architecture=%s' % _get_cuda_gpu_arch_string()
65  File "/content/stylegan2/dnnlib/tflib/custom_ops.py", line 52, in _get_cuda_gpu_arch_string
66    raise RuntimeError('No GPU devices found')
67RuntimeError: No GPU devices found

該当のソースコード

python
1!pip uninstall tensorflow -y
2!pip install tensorflow-gpu==1.15
3!pip install keras==2.2.4
4
5!git clone https://github.com/NVlabs/stylegan2.git
6
7!wget https://battle.shawwn.com/sdc/stylegan2-imagenet-512/model.ckpt-533504.pkl
8# 画像生成の実行
9!python /content/stylegan2/run_generator.py generate-images \
10    --network=model.ckpt-533504.pkl \
11  --seeds=6600-6625 --truncation-psi=0.5

試したこと

調べたら pip tensorflow-gpu をインストールするとありましたが変わりませんでした。
ランタイムの再起動なども試しました。

補足情報（FW/ツールのバージョンなど）

python
1Wed Oct 12 02:58:04 2022       
2+-----------------------------------------------------------------------------+
3| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
4|-------------------------------+----------------------+----------------------+
5| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
6| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
7|                               |                      |               MIG M. |
8|===============================+======================+======================|
9|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
10| N/A   38C    P8     9W /  70W |      0MiB / 15109MiB |      0%      Default |
11|                               |                      |                  N/A |
12+-------------------------------+----------------------+----------------------+
13                                                                               
14+-----------------------------------------------------------------------------+
15| Processes:                                                                  |
16|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
17|        ID   ID                                                   Usage      |
18|=============================================================================|
19|  No running processes found                                                 |
20+-----------------------------------------------------------------------------+

jbpb0

2022/10/12 08:30

> 2022-10-12 01:50:08.183797: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia cuda 10.0が無くなってますねもしかしたら、 https://qiita.com/katoyu_try1/items/0228870c41d9ac54e6e9 みたいなことの一環で、tensorflow 1.*に必要なものを削除したのかもしれません cuda 10.0を入れたら動くだろうと、 https://stackoverflow.com/questions/58936927/tensorflow-1-14-on-google-collab-no-gpu の質問の、「Installing Cuda 10 (taken from the Tensorflow Docs)」と書かれてるところよりも下に(二つのブロックに分かれて)記載されてるコードを全部実行したら、この質問のコードが実行できました (途中で、キーボードのレイアウトを聞かれて、答えないといけないのがメンドくさいけど) コードの下に「And updating the LD_LIBRARY_PATH 」と書かれてますが、それはやらなくて大丈夫でしたまた、「Installing Cuda 10 (taken from the Tensorflow Docs)」よりも上に書かれてることも、やらなくて大丈夫でした

riku_university

2022/10/14 08:23

!apt-get --purge remove cuda nvidia* libnvidia-* !dpkg -l | grep cuda- | awk '{print $2}' | xargs -n1 dpkg --purge !apt-get remove cuda-* !apt autoremove !wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-repo-ubuntu1804_10.0.130-1_amd64.deb !sudo dpkg -i cuda-repo-ubuntu1804_10.0.130-1_amd64.deb !sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub !sudo apt-get update !wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb !sudo apt install -y ./nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb !sudo apt-get update # Install NVIDIA driver #!sudo apt-get install --no-install-recommends nvidia-driver-418 !sudo apt-get -y installnvidia-driver-418 # Reboot. Check that GPUs are visible using the command: nvidia-smi # Install development and runtime libraries (~4GB) #!sudo apt-get install --no-install-recommends \ !sudo apt-get install -y \ cuda-10-0 \ libcudnn7=7.6.2.24-1+cuda10.0 \ libcudnn7-dev=7.6.2.24-1+cuda10.0 # Install TensorRT. Requires that libcudnn7 is installed above. # !sudo apt-get install -y --no-install-recommends libnvinfer5=5.1.5-1+cuda10.0 \ !sudo apt-get install -y libnvinfer5=5.1.5-1+cuda10.0 \ libnvinfer-dev=5.1.5-1+cuda10.0 !apt --fix-broken install ↑を実行したのですが、変わらずGPUを認識してくれません。 !wget https://battle.shawwn.com/sdc/stylegan2-imagenet-512/model.ckpt-533504.pkl # 画像生成の実行 !python /content/stylegan2/run_generator.py generate-images \ --network=model.ckpt-533504.pkl \ --seeds=6600-6625 --truncation-psi=0.5 ↑を実行するとエラーが変わり↓が表示されます。 tensorflow.python.framework.errors_impl.NotFoundError: /content/stylegan2/dnnlib/tflib/_cudacache/fused_bias_act_455e3ae619ac31ebc8962a246b0550da.so: undefined symbol: _ZN10tensorflow12OpDefBuilder5InputESs

riku_university

2022/10/14 08:33

tensorflow のバージョンを1.14から1.15に変更することで解決しました。

jbpb0

2022/10/14 09:06

> tensorflow のバージョンを1.14から1.15に変更することで解決しました。あれ？質問に記載のコードは > !pip install tensorflow-gpu==1.15 ですが、質問のコードは、質問者さんが実際に実行してるコードと違うのでしょうか？

riku_university

2022/10/14 09:18

解決策を検索して色々試していたのでその時に1.14に変更していたようです。お騒がせしました。

行動規範の内容に同意します

回答1件

ベストアンサー

2022-10-12 01:50:08.183797: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia

cuda 10.0が無くなってますね

　
cuda 10.0を入れたら動くだろうと、
Tensorflow 1.14 on Google Collab - No GPU
の質問の、
「Installing Cuda 10 (taken from the Tensorflow Docs)」
と書かれてるところよりも下に(二つのブロックに分かれて)記載されてるコードを全部実行したら、この質問のコードが実行できました
(途中で、キーボードのレイアウトを聞かれて、答えないといけないのがメンドくさいけど)

コードの下に「And updating the LD_LIBRARY_PATH 」と書かれてますが、それはやらなくて大丈夫でした

また、「Installing Cuda 10 (taken from the Tensorflow Docs)」よりも上に書かれてることも、やらなくて大丈夫でした

　
【追記】
その後google colabのpythonが3.8に変わったので、現状ではpython 3.7に切り替えるために下記も実行する必要があります

python
1!sudo apt-get install python3.7
2!sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.7 1
3!sudo update-alternatives --config python3
4!sudo apt install python3-pip
5
6!python -m pip install --upgrade --force-reinstall pip
7
8!pip install pillow
9!pip install requests
10!pip install protobuf==3.20

途中で下記のように聞かれるので、「2」を入力して「/usr/bin/python3.7」を選びます

  Selection    Path                Priority   Status
------------------------------------------------------------
* 0            /usr/bin/python3.8   2         auto mode
  1            /usr/bin/python3.6   1         manual mode
  2            /usr/bin/python3.7   1         manual mode
  3            /usr/bin/python3.8   2         manual mode

参考
How to install Python 3.7 in google colab?

投稿2022/10/12 08:41

編集2023/01/16 07:01

jbpb0

総合スコア7658