PytorchによるCIFAR-10分類のデータサイズについて

CNNの勉強をしておりPytorchの公式を読んだりしているのですが、そのうちのCIFAR10分類の

import torch.nn as nn
import torch.nn.functional as F


class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x


net = Net()

このネットワーク定義の中で全結合層に渡すデータサイズが16*5*5になる理由が分かりません。
32*32の画像に対してプーリングを2度行っているので16*8*8ではないのでしょうか？

行動規範の内容に同意します

回答1件

ベストアンサー

プーリングやパディングなしの畳込みで大きさが削られていきますので、全結合層の直前の形状は (16, 5, 5) となっており、合っています。

pytorch-summary という各層の形状が確認できるツールがあり、それで確認したところ以下のようになりました。

sksq96/pytorch-summary: Model summary in PyTorch similar to model.summary() in Keras

python
1import torch.nn as nn
2import torch.nn.functional as F
3from torchsummary import summary
4
5
6class Net(nn.Module):
7    def __init__(self):
8        super(Net, self).__init__()
9        self.conv1 = nn.Conv2d(3, 6, 5)
10        self.pool = nn.MaxPool2d(2, 2)
11        self.conv2 = nn.Conv2d(6, 16, 5)
12        self.fc1 = nn.Linear(16 * 5 * 5, 120)
13        self.fc2 = nn.Linear(120, 84)
14        self.fc3 = nn.Linear(84, 10)
15
16    def forward(self, x):
17        x = self.pool(F.relu(self.conv1(x)))
18        x = self.pool(F.relu(self.conv2(x)))
19        x = x.view(-1, 16 * 5 * 5)
20        x = F.relu(self.fc1(x))
21        x = F.relu(self.fc2(x))
22        x = self.fc3(x)
23        return x
24
25
26net = Net()
27
28summary(net, input_size=(3, 32, 32), device="cpu")

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1            [-1, 6, 28, 28]             456
         MaxPool2d-2            [-1, 6, 14, 14]               0
            Conv2d-3           [-1, 16, 10, 10]           2,416
         MaxPool2d-4             [-1, 16, 5, 5]               0
            Linear-5                  [-1, 120]          48,120
            Linear-6                   [-1, 84]          10,164
            Linear-7                   [-1, 10]             850
================================================================
Total params: 62,006
Trainable params: 62,006
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.01
Forward/backward pass size (MB): 0.06
Params size (MB): 0.24
Estimated Total Size (MB): 0.31
----------------------------------------------------------------

投稿2020/11/02 03:42