RuntimeError: Unexpected error from cudaGetDeviceCount()が出てしまう

前提

terminalにてコードを実行したところ、
"RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 803: system has unsupported display driver / cuda driver combination"
という見慣れないエラーが出てしなっています。

実現したいこと

RuntimeErrorを解決する

発生している問題・エラーメッセージ

$ python demos/test_emoca_on_images.py

Taking config of stage 'detail'
dict_keys(['coarse', 'detail'])
Looking for checkpoint in '/home/yuuri/Ascender/emoca/assets/EMOCA/models/EMOCA/detail/checkpoints'
Found 1 checkpoints
 - /home/yuuri/Ascender/emoca/assets/EMOCA/models/EMOCA/detail/checkpoints/deca-epoch=03-val_loss/dataloader_idx_0=9.44489288.ckpt
Selecting checkpoint '/home/yuuri/Ascender/emoca/assets/EMOCA/models/EMOCA/detail/checkpoints/deca-epoch=03-val_loss/dataloader_idx_0=9.44489288.ckpt'
Loading checkpoint '/home/yuuri/Ascender/emoca/assets/EMOCA/models/EMOCA/detail/checkpoints/deca-epoch=03-val_loss/dataloader_idx_0=9.44489288.ckpt'
Creating classic detail generator.
fc.weight  not available in reconstructed resnet
fc.bias  not available in reconstructed resnet
copy resnet state dict finished!
creating the FLAME Decoder
/home/yuuri/Ascender/emoca/gdl/models/DecaFLAME.py:92: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  torch.tensor(lmk_embeddings['dynamic_lmk_faces_idx'], dtype=torch.long))
/home/yuuri/Ascender/emoca/gdl/models/DecaFLAME.py:94: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  torch.tensor(lmk_embeddings['dynamic_lmk_bary_coords'], dtype=self.dtype))
fc.weight  not available in reconstructed resnet
fc.bias  not available in reconstructed resnet
copy resnet state dict finished!
fc.weight  not available in reconstructed resnet
fc.bias  not available in reconstructed resnet
copy resnet state dict finished!
/home/yuuri/anaconda3/envs/work36_cu11/lib/python3.6/site-packages/pytorch3d/io/obj_io.py:533: UserWarning: Mtl file does not exist: /home/yuuri/Ascender/emoca/assets/FLAME/geometry/template.mtl
  warnings.warn(f"Mtl file does not exist: {f}")
Traceback (most recent call last):
  File "demos/test_emoca_on_images.py", line 89, in <module>
    main()
  File "demos/test_emoca_on_images.py", line 59, in main
    emoca.cuda()
  File "/home/yuuri/Ascender/emoca/gdl/models/DECA.py", line 293, in cuda
    super().cuda(device)
  File "/home/yuuri/anaconda3/envs/work36_cu11/lib/python3.6/site-packages/pytorch_lightning/core/mixins/device_dtype_mixin.py", line 127, in cuda
    return super().cuda(device=device)
  File "/home/yuuri/anaconda3/envs/work36_cu11/lib/python3.6/site-packages/torch/nn/modules/module.py", line 637, in cuda
    return self._apply(lambda t: t.cuda(device))
  File "/home/yuuri/anaconda3/envs/work36_cu11/lib/python3.6/site-packages/torch/nn/modules/module.py", line 530, in _apply
    module._apply(fn)
  File "/home/yuuri/anaconda3/envs/work36_cu11/lib/python3.6/site-packages/torch/nn/modules/module.py", line 530, in _apply
    module._apply(fn)
  File "/home/yuuri/anaconda3/envs/work36_cu11/lib/python3.6/site-packages/torch/nn/modules/module.py", line 530, in _apply
    module._apply(fn)
  [Previous line repeated 1 more time]
  File "/home/yuuri/anaconda3/envs/work36_cu11/lib/python3.6/site-packages/torch/nn/modules/module.py", line 552, in _apply
    param_applied = fn(param)
  File "/home/yuuri/anaconda3/envs/work36_cu11/lib/python3.6/site-packages/torch/nn/modules/module.py", line 637, in <lambda>
    return self._apply(lambda t: t.cuda(device))
  File "/home/yuuri/anaconda3/envs/work36_cu11/lib/python3.6/site-packages/torch/cuda/__init__.py", line 172, in _lazy_init
    torch._C._cuda_init()
RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 803: system has unsupported display driver / cuda driver combination

該当のソースコード

python
1from gdl_apps.EMOCA.utils.load import load_model
2from gdl.datasets.ImageTestDataset import TestData
3import gdl
4import numpy as np
5import os
6import torch
7from skimage.io import imsave
8from pathlib import Path
9from tqdm import auto
10import argparse
11from gdl_apps.EMOCA.utils.io import save_obj, save_images, save_codes, test
12
13
14def main():
15    parser = argparse.ArgumentParser()
16    # add the input folder arg 
17    parser.add_argument('--input_folder', type=str, default= str(Path(gdl.__file__).parents[1] / "data/EMOCA_test_example_data/images/affectnet_test_examples"))
18    parser.add_argument('--output_folder', type=str, default="image_output", help="Output folder to save the results to.")
19    parser.add_argument('--model_name', type=str, default='EMOCA', help='Name of the model to use.')
20    parser.add_argument('--path_to_models', type=str, default=str(Path(gdl.__file__).parents[1] / "assets/EMOCA/models"))
21    parser.add_argument('--save_images', type=bool, default=True, help="If true, output images will be saved")
22    parser.add_argument('--save_codes', type=bool, default=False, help="If true, output FLAME values for shape, expression, jaw pose will be saved")
23    parser.add_argument('--save_mesh', type=bool, default=False, help="If true, output meshes will be saved")
24    
25    args = parser.parse_args()
26
27
28    # path_to_models = '/ps/scratch/rdanecek/emoca/finetune_deca'
29    # path_to_models = '/is/cluster/work/rdanecek/emoca/finetune_deca'
30    path_to_models = args.path_to_models
31    input_folder = args.input_folder
32    output_folder = args.output_folder
33    model_name = args.model_name
34
35    mode = 'detail'
36    # mode = 'coarse'
37
38    # 1) Load the model
39    emoca, conf = load_model(path_to_models, model_name, mode)
40    emoca.cuda()
41    emoca.eval()
42
43    # 2) Create a dataset
44    dataset = TestData(input_folder, face_detector="fan", max_detection=20)
45
46    ## 4) Run the model on the data
47    for i in auto.tqdm( range(len(dataset))):
48        batch = dataset[i]
49        vals, visdict = test(emoca, batch)
50        # name = f"{i:02d}"
51        current_bs = batch["image"].shape[0]
52
53        for j in range(current_bs):
54            name =  batch["image_name"][j]
55
56            sample_output_folder = Path(output_folder) / name
57            sample_output_folder.mkdir(parents=True, exist_ok=True)
58
59            if args.save_mesh:
60                save_obj(emoca, str(sample_output_folder / "mesh_coarse.obj"), vals, j)
61            if args.save_images:
62                save_images(output_folder, name, visdict, with_detection=True, i=j)
63            if args.save_codes:
64                save_codes(Path(output_folder), name, vals, i=j)
65
66    print("Done")
67
68
69if __name__ == '__main__':
70    main()

調べても似たようなエラーを見つけることが出来なかったため、こちらで質問させていただきました。
ご回答いただければ幸いです。

jbpb0

2022/07/26 23:50 編集

「torch._C._cuda_init()」でエラーが出てるので、pytorchでcudaがちゃんと使える状態になってないように思います pythonで下記を実行したら、「True」になりますでしょうか？ import torch print(torch.cuda.is_available()) 上記の結果が「True」の場合は、下記の結果も教えてください print(torch.__version__) print(torch.version.cuda) print(torch.cuda.device_count()) print(torch.cuda.current_device()) print(torch.cuda.get_device_name())

jbpb0

2022/07/26 23:44 編集

もしdocker環境下なら、質問を編集して、そのことを追記してください

waaaaaaaa

2022/07/27 00:40

返信頂きありがとうございます。 print(torch.cuda.is_available())を実行したところ、Falseと表示されました。こちらがエラーに原因なのでしょうか？

waaaaaaaa

2022/07/27 00:45

エラー内容はこちらになります。 "UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 803: system has unsupported display driver / cuda driver combination (Triggered internally at /opt/conda/conda-bld/pytorch_1631630866422/work/c10/cuda/CUDAFunctions.cpp:115.) return torch._C._cuda_getDeviceCount() > 0"

jbpb0

2022/07/29 00:32 編集

> print(torch.cuda.is_available())を実行したところ、Falseと表示されました。それがTrueにならないとダメです GPUの型番と、グラフィックドライバーのバージョンと、cudaのバージョンと、pytorchのバージョンが全て整合してるか、確認してみてくださいあと、確認ですが、docker環境では無いのですよね？

行動規範の内容に同意します

回答1件

Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 803: system has unsupported display driver / cuda driver combination

「torch._C._cuda_init()」でエラーが出てるので、pytorchでcudaがちゃんと使える状態になってないように思います
pythonで下記を実行して、「True」にならないとダメです

python
1import torch
2print(torch.cuda.is_available())

　
上記の実行結果が「False」の場合は、pytorchやcudaのインストールから見直す必要があると思います
たとえば、GPUの型番と、グラフィックドライバーのバージョンと、cudaのバージョンと、pytorchのバージョンが全て整合してるか、等

投稿2022/07/29 00:31

編集2022/07/29 00:33

jbpb0

総合スコア7658

あなたの回答

tips

プレビュー

行動規範の内容に同意します

質問の解決につながる回答をしましょう。サンプルコードなど、より具体的な説明があると質問者の理解の助けになります。また、読む側のことを考えた、分かりやすい文章を心がけましょう。

まだベストアンサーが選ばれていません

会員登録して回答してみよう

アカウントをお持ちの方は

15分調べてもわからないことは
teratailで質問しよう！

ただいまの回答率
85.30%

質問をまとめることで
思考を整理して素早く解決

テンプレート機能で
簡単に質問をまとめる

質問する

質問をすることでしか得られない、回答やアドバイスがある。

15分調べてもわからないことは、質問しよう！

RuntimeError: Unexpected error from cudaGetDeviceCount()が出てしまう

前提

実現したいこと

発生している問題・エラーメッセージ

該当のソースコード

関連した質問