PyTorch のモデルを GPU に配置できない

Pytorch を使ってモデルを構築しようとしています。最初の段階で、Backboneモデルから中間層を３つ取り出すモジュールを書いていたのですが、これを GPU 上に配置出来なくて困っています。

python
1import torch.nn as nn
2import torchvision.models as models
3
4
5class Backbone(nn.Module):
6    def __init__(self, backbone_type="ResNet50"):
7        super(Backbone, self).__init__()
8        self.backbone_type = backbone_type
9
10    def forward(self, x):
11        if self.backbone_type == "ResNet50":
12            model = models._utils.IntermediateLayerGetter(
13                models.resnet50(), {"layer1": 2, "layer2": 3, "layer3": 4}
14            )
15
16            return model(x)

テストコードとして以下のようなコードを書いています。

python
1import numpy as np
2import pytest
3import torch
4
5from src.model import Backbone
6
7@pytest.mark.skipif(not torch.cuda.is_available(), reason="There is no GPU")
8def test_run_backbone_on_gpu():
9    back = Backbone()
10    dummy_input = torch.from_numpy(np.random.random((1, 3, 224, 224))).to(
11        "cuda", dtype=torch.float
12    )
13    back = back.cuda()
14    dummy_output = back(dummy_input)
15    assert dummy_output[2].size() == torch.Size([1, 256, 56, 56])
16    assert dummy_output[3].size() == torch.Size([1, 512, 28, 28])
17    assert dummy_output[4].size() == torch.Size([1, 1024, 14, 14])
18

このテストを実行すると以下のような Runtime Error が出ます。

bash
1================================================================================================ FAILURES =================================================================================================
2________________________________________________________________________________________ test_run_backbone_on_gpu _________________________________________________________________________________________
3
4    @pytest.mark.skipif(not torch.cuda.is_available(), reason="There is no GPU")
5    def test_run_backbone_on_gpu():
6        back = Backbone()
7        dummy_input = torch.from_numpy(np.random.random((1, 3, 224, 224))).to(
8            "cuda", dtype=torch.float
9        )
10        back = back.cuda()
11>       dummy_output = back(dummy_input)
12
13tests/test_model.py:43:
14_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
15.venv/lib/python3.8/site-packages/torch/nn/modules/module.py:722: in _call_impl
16    result = self.forward(*input, **kwargs)
17src/model.py:16: in forward
18    return model(x)
19.venv/lib/python3.8/site-packages/torch/nn/modules/module.py:722: in _call_impl
20    result = self.forward(*input, **kwargs)
21.venv/lib/python3.8/site-packages/torchvision/models/_utils.py:63: in forward
22    x = module(x)
23.venv/lib/python3.8/site-packages/torch/nn/modules/module.py:722: in _call_impl
24    result = self.forward(*input, **kwargs)
25.venv/lib/python3.8/site-packages/torch/nn/modules/conv.py:419: in forward
26    return self._conv_forward(input, self.weight)
27_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
28
29self = Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
30input = tensor([[[[0.1313, 0.0234, 0.7552,  ..., 0.3833, 0.4139, 0.8594],
31          [0.1304, 0.7056, 0.5246,  ..., 0.4825, 0.6..., 0.3003, 0.3479, 0.5493],
32          [0.3813, 0.9502, 0.2774,  ..., 0.6596, 0.5868, 0.9608]]]],
33       device='cuda:0')
34weight = Parameter containing:
35tensor([[[[-4.6613e-02, -9.6919e-04,  8.5332e-03,  ...,  2.0174e-02,
36            9.1100e-03, -4....[-2.4699e-02,  1.7219e-02, -1.9792e-02,  ..., -7.8135e-03,
37           -3.0772e-02, -1.7609e-02]]]], requires_grad=True)
38
39    def _conv_forward(self, input, weight):
40        if self.padding_mode != 'zeros':
41            return F.conv2d(F.pad(input, self._reversed_padding_repeated_twice, mode=self.padding_mode),
42                            weight, self.bias, self.stride,
43                            _pair(0), self.dilation, self.groups)
44>       return F.conv2d(input, weight, self.bias, self.stride,
45                        self.padding, self.dilation, self.groups)
46E       RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same
47
48.venv/lib/python3.8/site-packages/torch/nn/modules/conv.py:415: RuntimeError
49========================================================================================= short test summary info =========================================================================================
50FAILED tests/test_model.py::test_run_backbone_on_gpu - RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same
51

このエラーは、入力値は GPU 上にあるけどモデル（重みパラメータ）が GPU 上にないということを言っているのだと思いますが、これをどのように解決すればいいのかわかりません。

よろしくお願いいたします。

meg_

2020/09/01 14:35

エラーメッセージは全文掲載してください。

Yhaya

2020/09/01 14:39

追加しました。よろしくお願いします

行動規範の内容に同意します

回答1件

ベストアンサー

解決方法

Backbone() 内で使用する model を forward() ではなく、 __init__() 内で属性として定義する。

python
1class Backbone(nn.Module):
2    def __init__(self, backbone_type="ResNet50"):
3        super(Backbone, self).__init__()
4        self.backbone_type = backbone_type
5        if self.backbone_type == "ResNet50":
6            self.model = models._utils.IntermediateLayerGetter(
7                models.resnet50(), {"layer1": 2, "layer2": 3, "layer3": 4}
8            )
9
10    def forward(self, x):
11        return self.model(x)

原因について

Pytorch ではモデルを構成する層などは nn.Module を継承しています。
複数の nn.Module を組み合わせてモデルを作るわけですが、nn.Module 内で使用する nn.Module は、__init__() 内で属性として定義しないと、その nn.Module の子として自動的に認識されません。

`init()` 内で属性として定義した場合

python
1class Net(torch.nn.Module):
2    def __init__(self):
3        super().__init__()
4        # このモデルで使用する nn.Module はここで属性として定義する
5        self.linear1 = torch.nn.Linear(10, 10)
6        self.linear2 = torch.nn.Linear(10, 10)
7
8    def forward(self, x):
9        y = self.linear1(x)
10        y = self.linear2(y)
11        return y
12
13
14net = Net()
15# 子として認識されている
16for child in net.children():
17    print(child)
18# Linear(in_features=10, out_features=10, bias=True)
19# Linear(in_features=10, out_features=10, bias=True)

`init()` 内で属性として定義しなかった場合

python
1class Net(torch.nn.Module):
2    def __init__(self):
3        super().__init__()
4        # このモデルで使用する nn.Module はここで属性として定義する
5        self.linear2 = torch.nn.Linear(10, 10)
6
7    def forward(self, x):
8        linear1 = torch.nn.Linear(10, 10)
9        y = linear1(x)
10        y = self.linear2(y)
11        return y
12
13
14net = Net()
15# linear1 が子として認識されていない
16for child in net.children():
17    print(child)
18# Linear(in_features=10, out_features=10, bias=True)

to("cuda") や cuda() は nn.Module 内の子の nn.Module を GPU に転送するというものですが、そもそも子として認識されていない nn.Module は転送されないままということになります。
質問のコードだと models._utils.IntermediateLayerGetter() の返り値を __init__() 内で属性として定義していなかったので、子として認識されておらず、cuda() を実行しても GPU に転送されないままとなっています。

コードを読むと、nn.module 周りがどのように実現されているかがわかります。

torch.nn.modules.module — PyTorch 1.6.0 documentation

投稿2020/09/01 14:48

編集2020/09/01 14:50