【pytorch】複雑に入り組んでるモデルの中間層の出力を取得したい。

前提・実現したいこと

torchvisionのFaster R-CNNの第一段階（roi_heads後）の値だけ取得して特徴マップを取得したいです。
用いているコードはtorchvisionのFaster(Mask)R-CNNのチュートリアルです。

https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html

https://colab.research.google.com/github/pytorch/vision/blob/temp-tutorial/tutorials/torchvision_finetuning_instance_segmentation.ipynb#scrollTo=UYDb7PBw55b-

どうやらフックで引っ掛ける方法があるようですが、このroi_headなどは
torchvisionのモジュールとなっているためあまり自分の手で変更はしたくありません、、何か方法はありますでしょうか、、
(もし難しければroi_headのコードを一番下に添付しますので具体的な方法を教えて下さると助かります。)

printで出力したモデルのアーキテクチャ

出力したいのはほぼ一番下のroi_headsの出力です。

FasterRCNN(
  (transform): GeneralizedRCNNTransform(
      Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
      Resize(min_size=(800,), max_size=1333, mode='bilinear')
  )
  (backbone): Sequential(
    (0): ConvBNReLU(
      (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU6(inplace=True)
    )

###長いので省略###
    )
  )
  (rpn): RegionProposalNetwork(
    (anchor_generator): AnchorGenerator()
    (head): RPNHead(
      (conv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (cls_logits): Conv2d(1280, 15, kernel_size=(1, 1), stride=(1, 1))
      (bbox_pred): Conv2d(1280, 60, kernel_size=(1, 1), stride=(1, 1))
    )
  )
  (roi_heads): RoIHeads(
    (box_roi_pool): MultiScaleRoIAlign()
    (box_head): TwoMLPHead(
      (fc6): Linear(in_features=62720, out_features=1024, bias=True)
      (fc7): Linear(in_features=1024, out_features=1024, bias=True)
    )
    (box_predictor): FastRCNNPredictor(
      (cls_score): Linear(in_features=1024, out_features=5, bias=True)
      (bbox_pred): Linear(in_features=1024, out_features=20, bias=True)
    )
  )
)

#####modelがここまできれいに出力できるなら最初からroi_headsまでの出力(fc7まで)を取り出すことはできないでしょうか？？
ご教示いただければ非常に助かります。
よろしくおねがいします。

##roi_headsのコード
https://github.com/pytorch/vision/blob/10d5a55c332771164c13375f445331c52f8de6f1/torchvision/models/detection/roi_heads.py

##追加
すみませんほしいfc7の部分のコードはこちらになります。

class TwoMLPHead(nn.Module):
    """
    Standard heads for FPN-based models
    Arguments:
        in_channels (int): number of input channels
        representation_size (int): size of the intermediate representation
    """

    def __init__(self, in_channels, representation_size):
        super(TwoMLPHead, self).__init__()

        self.fc6 = nn.Linear(in_channels, representation_size)
        self.fc7 = nn.Linear(representation_size, representation_size)

    def forward(self, x):
        x = x.flatten(start_dim=1)

        x = F.relu(self.fc6(x))
        x = F.relu(self.fc7(x))

        return x

ここの部分のすべてのコードはこちらです
https://github.com/pytorch/vision/blob/10d5a55c332771164c13375f445331c52f8de6f1/torchvision/models/detection/faster_rcnn.py

行動規範の内容に同意します

回答1件

自己解決

以下のコードで実装できた

python
1#forwordが呼ばれるたびに呼ばれる
2class SaveOutput:
3    def __init__(self):
4        self.outputs = []
5        
6    def __call__(self, module, module_in, module_out):
7        self.outputs.append(module_out)
8        
9    def clear(self):
10        self.outputs = []
11
12
13save_output = SaveOutput()
14hook_handles = []
15
16#fc7の出力
17layer=model.roi_heads.box_head.fc7
18handle = layer.register_forward_hook(save_output)
19hook_handles.append(handle)
20
21
22save_output.outputs