質問編集履歴
7
文面の修正
test
CHANGED
File without changes
|
test
CHANGED
@@ -142,6 +142,16 @@
|
|
142
142
|
|
143
143
|
|
144
144
|
|
145
|
+
このことから
|
146
|
+
|
147
|
+
1.cudaのバージョンは10.1でおそらく正しい
|
148
|
+
|
149
|
+
2.pythonとGPU間自体に問題がある、pythonのインストール自体などに問題がある
|
150
|
+
|
151
|
+
ことが示唆されました。
|
152
|
+
|
153
|
+
|
154
|
+
|
145
155
|
cupy-cuda110の場合
|
146
156
|
|
147
157
|
```python
|
6
表現の修正
test
CHANGED
File without changes
|
test
CHANGED
@@ -226,11 +226,11 @@
|
|
226
226
|
|
227
227
|
File "<stdin>", line 1, in <module>
|
228
228
|
|
229
|
-
File "/
|
229
|
+
File "/xxx/.venv/tff-IwBB_zea/lib/python3.6/site-packages/cupy/_creation/ranges.py", line 55, in arange
|
230
230
|
|
231
231
|
ret = cupy.empty((size,), dtype=dtype)
|
232
232
|
|
233
|
-
File "/
|
233
|
+
File "/xxx/.venv/tff-IwBB_zea/lib/python3.6/site-packages/cupy/_creation/basic.py", line 22, in empty
|
234
234
|
|
235
235
|
return cupy.ndarray(shape, dtype, order=order)
|
236
236
|
|
5
cupyでのエラーについての説明
test
CHANGED
File without changes
|
test
CHANGED
@@ -130,6 +130,126 @@
|
|
130
130
|
|
131
131
|
```
|
132
132
|
|
133
|
-
追記2
|
133
|
+
### 追記2
|
134
134
|
|
135
135
|
cuda 11.0用のコマンド(pip install torch==1.7.0+cu110 torchvision==0.8.1+cu110 torchaudio===0.7.0 -f https://download.pytorch.org/whl/torch_stable.html)も試しましたが同一のエラーが発生しました。
|
136
|
+
|
137
|
+
|
138
|
+
|
139
|
+
あと、試しにcupyをインストールしてこちらでもgpuの使用をテストしようとしたところ
|
140
|
+
|
141
|
+
こちらだとcuda110版ではバージョンの相違によるエラーが検出されました。一方で、cuda101版を使うと、バージョンの差異のエラーは出なかったものの、関数の実行時にメモリのエラーが発生しました。
|
142
|
+
|
143
|
+
|
144
|
+
|
145
|
+
cupy-cuda110の場合
|
146
|
+
|
147
|
+
```python
|
148
|
+
|
149
|
+
>>> import cupy
|
150
|
+
|
151
|
+
Traceback (most recent call last):
|
152
|
+
|
153
|
+
File "/xxx/.venv/tff-IwBB_zea/lib/python3.6/site-packages/cupy/__init__.py", line 21, in <module>
|
154
|
+
|
155
|
+
from cupy import core # NOQA
|
156
|
+
|
157
|
+
File "/xxx/.venv/tff-IwBB_zea/lib/python3.6/site-packages/cupy/core/__init__.py", line 1, in <module>
|
158
|
+
|
159
|
+
from cupy.core import core # NOQA
|
160
|
+
|
161
|
+
ImportError: libcublas.so.11: cannot open shared object file: No such file or directory
|
162
|
+
|
163
|
+
|
164
|
+
|
165
|
+
During handling of the above exception, another exception occurred:
|
166
|
+
|
167
|
+
|
168
|
+
|
169
|
+
Traceback (most recent call last):
|
170
|
+
|
171
|
+
File "<stdin>", line 1, in <module>
|
172
|
+
|
173
|
+
File "/xxx/.venv/tff-IwBB_zea/lib/python3.6/site-packages/cupy/__init__.py", line 42, in <module>
|
174
|
+
|
175
|
+
six.reraise(ImportError, ImportError(msg), exc_info[2])
|
176
|
+
|
177
|
+
File "/xxx/.venv/tff-IwBB_zea/lib/python3.6/site-packages/six.py", line 702, in reraise
|
178
|
+
|
179
|
+
raise value.with_traceback(tb)
|
180
|
+
|
181
|
+
File "/xxx/.venv/tff-IwBB_zea/lib/python3.6/site-packages/cupy/__init__.py", line 21, in <module>
|
182
|
+
|
183
|
+
from cupy import core # NOQA
|
184
|
+
|
185
|
+
File "/xxx/.venv/tff-IwBB_zea/lib/python3.6/site-packages/cupy/core/__init__.py", line 1, in <module>
|
186
|
+
|
187
|
+
from cupy.core import core # NOQA
|
188
|
+
|
189
|
+
ImportError: CuPy is not correctly installed.
|
190
|
+
|
191
|
+
|
192
|
+
|
193
|
+
If you are using wheel distribution (cupy-cudaXX), make sure that the version of CuPy you installed matches with the version of CUDA on your host.
|
194
|
+
|
195
|
+
Also, confirm that only one CuPy package is installed:
|
196
|
+
|
197
|
+
$ pip freeze
|
198
|
+
|
199
|
+
|
200
|
+
|
201
|
+
If you are building CuPy from source, please check your environment, uninstall CuPy and reinstall it with:
|
202
|
+
|
203
|
+
$ pip install cupy --no-cache-dir -vvvv
|
204
|
+
|
205
|
+
|
206
|
+
|
207
|
+
Check the Installation Guide for details:
|
208
|
+
|
209
|
+
https://docs.cupy.dev/en/latest/install.html
|
210
|
+
|
211
|
+
|
212
|
+
|
213
|
+
original error: libcublas.so.11: cannot open shared object file: No such file or directory
|
214
|
+
|
215
|
+
```
|
216
|
+
|
217
|
+
cupy-cuda101の場合
|
218
|
+
|
219
|
+
```python
|
220
|
+
|
221
|
+
>>> import cupy as cp
|
222
|
+
|
223
|
+
>>> x = cp.arange(6).reshape(2, 3).astype('f')
|
224
|
+
|
225
|
+
Traceback (most recent call last):
|
226
|
+
|
227
|
+
File "<stdin>", line 1, in <module>
|
228
|
+
|
229
|
+
File "/home/slab/kshono/.venv/tff-IwBB_zea/lib/python3.6/site-packages/cupy/_creation/ranges.py", line 55, in arange
|
230
|
+
|
231
|
+
ret = cupy.empty((size,), dtype=dtype)
|
232
|
+
|
233
|
+
File "/home/slab/kshono/.venv/tff-IwBB_zea/lib/python3.6/site-packages/cupy/_creation/basic.py", line 22, in empty
|
234
|
+
|
235
|
+
return cupy.ndarray(shape, dtype, order=order)
|
236
|
+
|
237
|
+
File "cupy/core/core.pyx", line 138, in cupy.core.core.ndarray.__init__
|
238
|
+
|
239
|
+
File "cupy/cuda/memory.pyx", line 578, in cupy.cuda.memory.alloc
|
240
|
+
|
241
|
+
File "cupy/cuda/memory.pyx", line 1250, in cupy.cuda.memory.MemoryPool.malloc
|
242
|
+
|
243
|
+
File "cupy/cuda/memory.pyx", line 1270, in cupy.cuda.memory.MemoryPool.malloc
|
244
|
+
|
245
|
+
File "cupy/cuda/device.pyx", line 25, in cupy.cuda.device.get_device_id
|
246
|
+
|
247
|
+
File "cupy_backends/cuda/api/runtime.pyx", line 275, in cupy_backends.cuda.api.runtime.getDevice
|
248
|
+
|
249
|
+
File "cupy_backends/cuda/api/runtime.pyx", line 247, in cupy_backends.cuda.api.runtime.check_status
|
250
|
+
|
251
|
+
cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorMemoryAllocation: out of memory
|
252
|
+
|
253
|
+
|
254
|
+
|
255
|
+
```
|
4
cuda11.0用インストールについての記述を追加
test
CHANGED
File without changes
|
test
CHANGED
@@ -129,3 +129,7 @@
|
|
129
129
|
|
130
130
|
|
131
131
|
```
|
132
|
+
|
133
|
+
追記2
|
134
|
+
|
135
|
+
cuda 11.0用のコマンド(pip install torch==1.7.0+cu110 torchvision==0.8.1+cu110 torchaudio===0.7.0 -f https://download.pytorch.org/whl/torch_stable.html)も試しましたが同一のエラーが発生しました。
|
3
pytorchのインストールについて補足
test
CHANGED
File without changes
|
test
CHANGED
@@ -88,4 +88,44 @@
|
|
88
88
|
|
89
89
|
#### OS、Pytorchのバージョン
|
90
90
|
|
91
|
-
pytorch
|
91
|
+
pytorch version 1.7.0+cu101
|
92
|
+
|
93
|
+
OS debian 10.6
|
94
|
+
|
95
|
+
pytorchは確か一回pip install torchで普通にインストールした後、GPUの問題が起きたのを見てからアンインストールして、"pip install torch==1.7.0+cu101 torchvision==0.8.1+cu101 torchaudio==0.7.0 -f https://download.pytorch.org/whl/torch_stable.html"で再インストールしました。
|
96
|
+
|
97
|
+
|
98
|
+
|
99
|
+
cuda、OS、pytorchのバージョンはそれぞれ下記の通りで確認しました。
|
100
|
+
|
101
|
+
```bash
|
102
|
+
|
103
|
+
aoies: ~$ nvcc --version
|
104
|
+
|
105
|
+
nvcc: NVIDIA (R) Cuda compiler driver
|
106
|
+
|
107
|
+
Copyright (c) 2005-2019 NVIDIA Corporation
|
108
|
+
|
109
|
+
Built on Sun_Jul_28_19:07:16_PDT_2019
|
110
|
+
|
111
|
+
Cuda compilation tools, release 10.1, V10.1.243
|
112
|
+
|
113
|
+
|
114
|
+
|
115
|
+
aoies: ~$ cat /etc/debian_version
|
116
|
+
|
117
|
+
10.6
|
118
|
+
|
119
|
+
```
|
120
|
+
|
121
|
+
```python
|
122
|
+
|
123
|
+
>>> import torch
|
124
|
+
|
125
|
+
>>> print(torch.__version__)
|
126
|
+
|
127
|
+
1.7.0+cu101
|
128
|
+
|
129
|
+
|
130
|
+
|
131
|
+
```
|
2
OS、Pytorchのバージョンの情報を追加
test
CHANGED
File without changes
|
test
CHANGED
@@ -31,6 +31,8 @@
|
|
31
31
|
|
32
32
|
|
33
33
|
### 追記
|
34
|
+
|
35
|
+
#### GPU
|
34
36
|
|
35
37
|
nvidia-smiを使うと下記のような返答があるので恐らくGPUは動作しているのではないかと考えました。
|
36
38
|
|
@@ -83,3 +85,7 @@
|
|
83
85
|
|
84
86
|
|
85
87
|
```
|
88
|
+
|
89
|
+
#### OS、Pytorchのバージョン
|
90
|
+
|
91
|
+
pytorchのバージョンは1.7.0+cu101、OSはdebianのバージョン10.6です。
|
1
GPUの状態を追記
test
CHANGED
File without changes
|
test
CHANGED
@@ -27,3 +27,59 @@
|
|
27
27
|
|
28
28
|
|
29
29
|
```
|
30
|
+
|
31
|
+
|
32
|
+
|
33
|
+
### 追記
|
34
|
+
|
35
|
+
nvidia-smiを使うと下記のような返答があるので恐らくGPUは動作しているのではないかと考えました。
|
36
|
+
|
37
|
+
```
|
38
|
+
|
39
|
+
nvidia-smi
|
40
|
+
|
41
|
+
Tue Nov 10 23:49:33 2020
|
42
|
+
|
43
|
+
+-----------------------------------------------------------------------------+
|
44
|
+
|
45
|
+
| NVIDIA-SMI 450.66 Driver Version: 450.66 CUDA Version: 11.0 |
|
46
|
+
|
47
|
+
|-------------------------------+----------------------+----------------------+
|
48
|
+
|
49
|
+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
|
50
|
+
|
51
|
+
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|
52
|
+
|
53
|
+
| | | MIG M. |
|
54
|
+
|
55
|
+
|===============================+======================+======================|
|
56
|
+
|
57
|
+
| 0 Tesla K40m On | 00000000:03:00.0 Off | 0 |
|
58
|
+
|
59
|
+
| N/A 32C P8 21W / 235W | 0MiB / 11441MiB | 0% Default |
|
60
|
+
|
61
|
+
| | | N/A |
|
62
|
+
|
63
|
+
+-------------------------------+----------------------+----------------------+
|
64
|
+
|
65
|
+
| 1 Tesla K40m On | 00000000:04:00.0 Off | 0 |
|
66
|
+
|
67
|
+
| N/A 31C P8 21W / 235W | 0MiB / 11441MiB | 0% Default |
|
68
|
+
|
69
|
+
| | | N/A |
|
70
|
+
|
71
|
+
+-------------------------------+----------------------+----------------------+
|
72
|
+
|
73
|
+
| 2 Tesla K40m On | 00000000:82:00.0 Off | 0 |
|
74
|
+
|
75
|
+
| N/A 31C P8 20W / 235W | 0MiB / 11441MiB | 0% Default |
|
76
|
+
|
77
|
+
| | | N/A |
|
78
|
+
|
79
|
+
+-------------------------------+----------------------+----------------------+
|
80
|
+
|
81
|
+
|
82
|
+
|
83
|
+
|
84
|
+
|
85
|
+
```
|