回答編集履歴

修正

2020/07/22 03:54

投稿

tiitoi

スコア21960

answer CHANGED Viewed

@@ -17,6 +17,12 @@
 * Core i7-6700K @ 4.00GHz (index が numpy): 27.5 ms
 * Core i7-6700K @ 4.00GHz (index が list): 442 ms
+インデックスにリストを使うなど配列処理に Python のオブジェクトが混じってしまうと、Cで最適化された numpy のコードではなく、Python のコードを実行することになるので、かなり遅くなります。
+おおよそ
+* GPU は CPU より数十倍早い
+* C言語は Python より100倍ぐらい早い
 [Python - コードの実行時間を計測する方法 - pystyle](https://pystyle.info/python-wall-time-measurement/)
 ### Pytorch の計測コード

修正

2020/07/22 03:54

投稿

tiitoi

スコア21960

answer CHANGED Viewed

@@ -14,8 +14,8 @@
 * GTX 1080: 1.23 ms
 * GTX 2080: 700 µs
-* CPU (index が numpy): 27.5 ms
+* Core i7-6700K @ 4.00GHz (index が numpy): 27.5 ms
-* CPU (index が list): 442 ms
+* Core i7-6700K @ 4.00GHz (index が list): 442 ms
 [Python - コードの実行時間を計測する方法 - pystyle](https://pystyle.info/python-wall-time-measurement/)

修正

2020/07/22 03:48

投稿

tiitoi

スコア21960

answer CHANGED Viewed

@@ -1,7 +1,7 @@
 > pytorchはfancy indexをサポートしていないということでしたので
 サポートしています。
-遅いのは、一旦 numpy に戻して処理しているからだと思います。
+遅いのは、一旦 numpy に戻した上に、インデックスが tolist() で Python のリストにして処理しているからだと思います。
 numpy でできることは、基本的に Pytorch でできるので、例えば arange() なども `numpy.arange()` ではなく、`torch.arange(..., device="cuda") のように最初から GPU 上に作りましょう。
@@ -14,9 +14,13 @@
 * GTX 1080: 1.23 ms
 * GTX 2080: 700 µs
+* CPU (index が numpy): 27.5 ms
+* CPU (index が list): 442 ms
 [Python - コードの実行時間を計測する方法 - pystyle](https://pystyle.info/python-wall-time-measurement/)
+### Pytorch の計測コード
 ```python
 import torch
@@ -33,4 +37,42 @@
 %timeit A[I, J]
 # GTX 1080: 1.23 ms ± 154 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
 # GTX 2080: 700 µs ± 68.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
+```
+### CPU (index が numpy) の計測コード
+```
+import numpy as np
+N = 7000
+M = 1500000
+A = np.random.randn(N, N)
+I = np.random.randint(0, N, size=M)
+J = np.random.randint(0, N, size=M)
+print(A.shape, I.shape, J.shape)
+# torch.Size([7000, 7000]) torch.Size([1500000]) torch.Size([1500000])
+%timeit -n100 A[I, J]
+# 27.5 ms ± 57.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
+```
+### CPU (index が list) の計測コード
+```python
+import numpy as np
+N = 7000
+M = 1500000
+A = np.random.randn(N, N)
+I = np.random.randint(0, N, size=M).tolist()
+J = np.random.randint(0, N, size=M).tolist()
+print(A.shape, len(I), len(J))
+# torch.Size([7000, 7000]) torch.Size([1500000]) torch.Size([1500000])
+%timeit -n1 A[I, J]
+# 442 ms ± 2.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
 ```

修正

2020/07/22 03:46

投稿

tiitoi

スコア21960

answer CHANGED Viewed

@@ -23,9 +23,9 @@
 N = 7000
 M = 1500000
-A = torch.randn(N, N, device="cuda:1")
+A = torch.randn(N, N, device="cuda")
-I = torch.randint(high=N, size=(M,), device="cuda:1")
+I = torch.randint(high=N, size=(M,), device="cuda")
-J = torch.randint(high=N, size=(M,), device="cuda:1")
+J = torch.randint(high=N, size=(M,), device="cuda")
 print(A.shape, I.shape, J.shape)
 # torch.Size([7000, 7000]) torch.Size([1500000]) torch.Size([1500000])