前提・実現したいこと

torchvisionのFaster R-CNNの第一段階（roi_heads後）の値だけ取得して特徴マップを可視化してみたいのですが
そもそもの値の取得方法が分かりません。
用いているコードはtorchvisionのFaster(Mask)R-CNNのチュートリアルです。

https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html

https://colab.research.google.com/github/pytorch/vision/blob/temp-tutorial/tutorials/torchvision_finetuning_instance_segmentation.ipynb#scrollTo=UYDb7PBw55b-

追記:
どうやらフックで引っ掛ける方法があるようですが、このroi_headなどは
torchvisionのモジュールとなっているためあまり自分の手で変更はしたくありません、、何か方法はありますでしょうか、、
(もしなければroi_headのコードを一番下に添付しますので具体的な方法を教えて下さると助かります。)

ソースコード（model定義部分）

python
1
2
3def model ():
4    #モデルの定義
5
6    import torchvision
7    from torchvision.models.detection import FasterRCNN
8    from torchvision.models.detection.rpn import AnchorGenerator
9    
10    
11    # load a pre-trained model for classification and return
12    # only the features
13    backbone = torchvision.models.mobilenet_v2(pretrained=True).features
14
15    # FasterRCNN needs to know the number of
16    # output channels in a backbone. For mobilenet_v2, it's 1280
17    # so we need to add it here
18    backbone.out_channels = 1280
19    
20    
21    
22
23
24    # let's make the RPN generate 5 x 3 anchors per spatial
25    # location, with 5 different sizes and 3 different aspect
26    # ratios. We have a Tuple[Tuple[int]] because each feature
27    # map could potentially have different sizes and
28    # aspect ratios 
29    
30    anchor_generator = AnchorGenerator(sizes=((32, 64, 128, 256, 512),),
31                                    aspect_ratios=((0.5, 1.0, 2.0),))
32  
33    # let's define what are the feature maps that we will
34    # use to perform the region of interest cropping, as well as
35    # the size of the crop after rescaling.
36    # if your backbone returns a Tensor, featmap_names is expected to
37    # be [0]. More generally, the backbone should return an
38    # OrderedDict[Tensor], and in featmap_names you can choose which
39    # feature maps to use.
40
41
42
43                           
44    
45    
46    roi_pooler =torchvision.ops.MultiScaleRoIAlign(
47    
48                    featmap_names=['0'],
49                    output_size=3,
50                    sampling_ratio=2)
51    
52        
53    # put the pieces together inside a FasterRCNN model
54    model = FasterRCNN(backbone,
55                    #num_classes=11,#2
56                    num_classes=5,
57                    rpn_anchor_generator=anchor_generator)
58                    #box_roi_pool=roi_pooler)
59    
60  
61    
62    return model
63    
64print(model())
65

printで出力したモデルのアーキテクチャ

出力したいのはほぼ一番下のroi_headsの出力です。

FasterRCNN(
  (transform): GeneralizedRCNNTransform(
      Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
      Resize(min_size=(800,), max_size=1333, mode='bilinear')
  )
  (backbone): Sequential(
    (0): ConvBNReLU(
      (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU6(inplace=True)
    )

###長いので省略###
    )
  )
  (rpn): RegionProposalNetwork(
    (anchor_generator): AnchorGenerator()
    (head): RPNHead(
      (conv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (cls_logits): Conv2d(1280, 15, kernel_size=(1, 1), stride=(1, 1))
      (bbox_pred): Conv2d(1280, 60, kernel_size=(1, 1), stride=(1, 1))
    )
  )
  (roi_heads): RoIHeads(
    (box_roi_pool): MultiScaleRoIAlign()
    (box_head): TwoMLPHead(
      (fc6): Linear(in_features=62720, out_features=1024, bias=True)
      (fc7): Linear(in_features=1024, out_features=1024, bias=True)
    )
    (box_predictor): FastRCNNPredictor(
      (cls_score): Linear(in_features=1024, out_features=5, bias=True)
      (bbox_pred): Linear(in_features=1024, out_features=20, bias=True)
    )
  )
)

#####modelがここまできれいに出力できるならこのroi_headsの出力(fc7まで)を取り出すことはできないでしょうか？？
ご教示いただければ非常に助かります。
よろしくおねがいします。

追記 roi_headのコード

https://github.com/pytorch/vision/blob/10d5a55c332771164c13375f445331c52f8de6f1/torchvision/models/detection/roi_heads.py

#追記　FC7の部分のコード

class TwoMLPHead(nn.Module):
    """
    Standard heads for FPN-based models
    Arguments:
        in_channels (int): number of input channels
        representation_size (int): size of the intermediate representation
    """

    def __init__(self, in_channels, representation_size):
        super(TwoMLPHead, self).__init__()

        self.fc6 = nn.Linear(in_channels, representation_size)
        self.fc7 = nn.Linear(representation_size, representation_size)

    def forward(self, x):
        x = x.flatten(start_dim=1)

        x = F.relu(self.fc6(x))
        x = F.relu(self.fc7(x))

        return x

コードの全体は以下です。
https://github.com/pytorch/vision/blob/10d5a55c332771164c13375f445331c52f8de6f1/torchvision/models/detection/faster_rcnn.py

行動規範の内容に同意します

回答2件

ベストアンサー

フックで引っ掛けるのが一般的かと思います。

”pytorch　中間層出力”、”pytorch hook”、”register_forward_hook"でググるといくつかサンプル書いてくれてる親切な方が見つかると思います。

↓とか。
https://ichi.pro/shitteoku-beki-1-tsu-no-pytorch-torikku-49884369363658

投稿2020/11/03 05:06

s-uchi

総合スコア101

ImR0305

2020/11/29 08:41

ご回答ありがとうございます。返答が遅れて申し訳ありません。フックで引っ掛けるというのは今回の例のように様々な関数,モデルを含んだ場合でも途中のfc7の部分までの重みを取ってこられるのでしょうか？ (例えばfc7が定義されている関数TwoMLPHead()内にhookのコードを書けば、このモデル全体のfc7以前のレイヤーを通して計算したのち、TwoMLPHead()のfc7に通した後の値が取得できるものでしょうか？) 日本語が下手で申し訳ありませんがよろしくお願いします。

s-uchi

2020/11/29 15:25

日本語の理解が多々しければＹＥＳです。 backbone -> rpn -> roi_heads のforward処理中にhookで登録したレイヤを通過した時の計算過程にアクセスできます。今回の話であれば、backbone通ってrpn通ってroi_headsに入って特徴がいい感じで凝縮できたfc7の特徴ベクトルが抽出できるはずです。コードを書くのは、modelを呼び出す側（print(model()を書いている場所ぐらい）に書くことになります。（TwoMLPHead()の中ではない）話変わりますが、やりたいこととしては「fc7の出力＝1024次元のベクトル」が見たいんですよね？ > torchvisionのFaster R-CNNの第一段階（roi_heads後）の値だけ取得して > 特徴マップを可視化してみたいのですがそもそもの値の取得方法が分かりません。日本語めちゃくちゃで、、、 Faster R-CNNの第一段階の出力＝backboneの特徴マップ（roi_heads後）の出力=cls_score, bbox_predの最終の予測結果今回の補足＝1024次元の特徴ベクトルとどの値を取得したいかがわからなくなってきました。あと、hookの理解が難しければ、・TwoMLPHeadを継承したMyMLPHeadクラスを作って、forward関数をオーバーライドしてfc7の出力だす。・継承の理解も苦しければTwoMLPHeadを直接変更する ```python def forward(self, x): x = x.flatten(start_dim=1) x = F.relu(self.fc6(x)) out = F.relu(self.fc7(x)) return out, x ```

ImR0305

2020/11/30 02:04

ご返信ありがとうございました！日本語が下手で申し訳ありません。やりたかったことはとにかく中間層の特徴を抽出することだったのでs-uchi 様のご返答は大変参考になり、実装することができました！

行動規範の内容に同意します

以下のコードで実装できた

python
1#forwordが呼ばれるたびに呼ばれる
2class SaveOutput:
3    def __init__(self):
4        self.outputs = []
5        
6    def __call__(self, module, module_in, module_out):
7        self.outputs.append(module_out)
8        
9    def clear(self):
10        self.outputs = []
11
12
13save_output = SaveOutput()
14hook_handles = []
15
16#fc7の出力
17layer=model.roi_heads.box_head.fc7
18handle = layer.register_forward_hook(save_output)
19hook_handles.append(handle)
20
21
22save_output.outputs