Jetson NanoでCNNの推論を実行した際にGPUが動作していない問題に関して

質問

お世話になります。

JetsnNanoを用いてエッジ側で画像処理を行っているのですが、「GPUで推論が出来ておらず、CPUよりもGPUの推論速度が遅い現象」が起きております。

以下に示す「試したこと」を受け、「確認したこと」を実施しました。

症状としては恐らくGPUでの推論が出来ておらず、推論時間が掛かっていることが原因だと思われます。

そこで質問なのですが、そもそもこのプログラムの書き方で正しいのでしょうか？

参考文献及び公式にチュートリアルを参考にしたのですが、いまいち分かりかねるためまずは問題の切り分けを行いたいと考えています。

またJetsonの設定場合が原因の場合、何かしら参考になるリファレンスを教えていただけると幸いです。

試したこと

CPUで動作するプログラム、GPUで動作するプログラム2つのプログラムを用いて推論時間を実測。

GPU版
1# coding: utf-8
2
3# パッケージのimport
4import numpy as np
5import json
6from PIL import Image
7import torch
8import torchvision
9from torchvision import models, transforms
10
11print("PyTorch Version: ",torch.__version__)
12print("Torchvision Version: ",torchvision.__version__)
13
14use_pretrained = True  # 学習済みのパラメータを使用
15net = models.vgg16(pretrained=use_pretrained)
16net.eval()  # 推論モードに設定
17
18# 入力画像の前処理のクラス
19class BaseTransform():
20    def __init__(self, resize, mean, std):
21        self.base_transform = transforms.Compose([
22            transforms.Resize(resize),  # 短い辺の長さがresizeの大きさになる
23            transforms.CenterCrop(resize),  # 画像中央をresize × resizeで切り取り
24            transforms.ToTensor(),  # Torchテンソルに変換
25            transforms.Normalize(mean, std)  # 色情報の標準化
26        ])
27
28    def __call__(self, img):
29        return self.base_transform(img)
30
31# 画像前処理の動作を確認
32# 1. 画像読み込み
33image_file_path = './data/goldenretriever-3724972_640.jpg'
34img = Image.open(image_file_path)  # [高さ][幅][色RGB]
35
36# 3. 画像の前処理と処理済み画像の表示
37resize = 224
38mean = (0.485, 0.456, 0.406)
39std = (0.229, 0.224, 0.225)
40transform = BaseTransform(resize, mean, std)
41img_transformed = transform(img)  # torch.Size([3, 224, 224])
42
43# 出力結果からラベルを予測する後処理クラスを作成
44ILSVRC_class_index = json.load(open('./data/imagenet_class_index.json', 'r'))
45ILSVRC_class_index
46
47# 出力結果からラベルを予測する後処理クラス
48class ILSVRCPredictor():
49    def __init__(self, class_index):
50        self.class_index = class_index
51
52    def predict_max(self, out):
53        maxid = np.argmax(out.detach().cpu().numpy())
54        predicted_label_name = self.class_index[str(maxid)][1]
55
56        return predicted_label_name
57
58# GPUが使えるかを確認
59device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
60print("使用デバイス：", device)
61
62# ネットワークをGPUへ
63net = net.to(device)
64
65# 学習済みVGGモデルで手元の画像を予測
66# ILSVRCのラベル情報をロードし辞意書型変数を生成します
67ILSVRC_class_index = json.load(open('./data/imagenet_class_index.json', 'r'))
68predictor = ILSVRCPredictor(ILSVRC_class_index)
69
70# 入力画像を読み込む
71image_file_path = './data/goldenretriever-3724972_640.jpg'
72img = Image.open(image_file_path)  # [高さ][幅][色RGB]
73
74# 前処理の後、バッチサイズの次元を追加する
75transform = BaseTransform(resize, mean, std)  # 前処理クラス作成
76img_transformed = transform(img)  # torch.Size([3, 224, 224])
77inputs = img_transformed.unsqueeze_(0)  # torch.Size([1, 3, 224, 224])
78
79# GPUが使えるならGPUにデータを送る
80inputs = inputs.to(device)
81
82# モデルに入力し、モデル出力をラベルに変換する
83out = net(inputs)  # torch.Size([1, 1000])
84result = predictor.predict_max(out)
85
86# 予測結果を出力する
87print("入力画像の予測結果：", result)

確認したこと

プログラムのどこで時間が掛かっているかの確認

out = net(inputs)  # torch.Size([1, 1000])

にて、処理時間を要していることを実測で確認。

フレームワークが正しくインストールされているかの確認

jetson@jetson-desktop:~$ python3
Python 3.6.8 (default, Oct  7 2019, 12:59:55) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

> > > import torch
> > > print(torch.__version__)
> > > 1.2.0a0+8554416
> > > print('CUDA available: ' + str(torch.cuda.is_available()))
> > > CUDA available: True
> > > a = torch.cuda.FloatTensor(2).zero_()
> > > print('Tensor a = ' + str(a))
> > > Tensor a = tensor([0., 0.], device='cuda:0')
> > > b = torch.randn(2).cuda()
> > > print('Tensor b = ' + str(b))
> > > Tensor b = tensor([0.4261, 2.1705], device='cuda:0')
> > > c = a + b
> > > print('Tensor c = ' + str(c))
> > > Tensor c = tensor([0.4261, 2.1705], device='cuda:0')

> > > import torchvision
> > > print(torchvision.__version__)
> > > 0.2.2

https://devtalk.nvidia.com/default/topic/1049071/jetson-nano/pytorch-for-jetson-nano/