深層学習を用いたリアルタイム物体認識結果の動画再生の処理をなるべく下げないようにする

前提・実現したいこと

Python, OpenCVを用いてキャプチャした画像に対して、tensorflowで学習したモデルを使って物体認識を行い、
動画のFPSをなるべく落とさずに、処理結果を画像に表示しながら、動画として再生させたいです。

ただし、どのように処理を工夫すればいいか分かりません。バッファリングや並列処理等、ナレッジや処理のベストプラクティスを教えていただけないでしょうか？

発生している問題・エラーメッセージ

動画を再生しているのですが、かなり遅いです。

該当のソースコード

python
1import cv2
2
3cap = cv2.VideoCapture("sample.mp4")
4
5frame_count = int(cap.get(7))
6frame_rate = int(cap.get(5))
7
8for i in range(frame_count):
9    is_read, frame = cap.read()
10    image_np = frame  
11    #############↓処理に時間がかかる場所↓############### 
12    with detection_graph.as_default():
13           with tf.Session(graph=detection_graph) as sess:
14                image_np_expanded = np.expand_dims(image_np, axis=0)
15                image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
16                # Each box represents a part of the image where a particular object was detected.
17                boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
18                # Each score represent how level of confidence for each of the objects.
19                # Score is shown on the result image, together with the class label.
20                scores = detection_graph.get_tensor_by_name('detection_scores:0')
21                classes = detection_graph.get_tensor_by_name('detection_classes:0')
22                num_detections = detection_graph.get_tensor_by_name('num_detections:0')
23                # Actual detection.
24                (boxes, scores, classes, num_detections) = sess.run(
25                   [boxes, scores, classes, num_detections],
26                   feed_dict={image_tensor: image_np_expanded})
27                # Visualization of the results of a detection.
28                vis_util.visualize_boxes_and_labels_on_image_array(
29                   image_np,
30                   np.squeeze(boxes),
31                   np.squeeze(classes).astype(np.int32),
32                   np.squeeze(scores),
33                   category_index,
34                   use_normalized_coordinates=True,
35                   line_thickness=8)
36　　　#############↑処理に時間がかかる場所↑###############
37    cv2.imshow("player", image_np)
38    # 動画再生中に動画を選択してescを押したら終了する
39    if cv2.waitKey(1) == 27:
40        break
41cap.release()
42cv2.destroyAllWindows()

試したこと

上記の該当のソースコードの
「#############処理に時間がかかる場所###############」
の範囲を消すと、なめらかに動画自体は再生されている事は確認しています。

補足情報（FW/ツールのバージョンなど）

Python 3.5.5
Opencv 3.4.1

動画ファイルのframe_rateは30fps、画像サイズは512×512です。

物体認識の学習モデルには下記のネットワークを使っております。
https://github.com/tensorflow/models/blob/master/research/object_detection/samples/configs/ssd_mobilenet_v2_coco.config

スペックは下記です
MacBook Pro (Retina, 13-inch, Early 2015)
2.7 GHz Intel Core i5
8GB 1867 MHz DDR3

行動規範の内容に同意します

回答3件

とりあえず、sess.runとvis_util.visualize_boxes_and_labels_on_image_array行以外tensorflowの処理をループ外forの前に移動させて改善されませんか？

投稿2018/11/13 23:41

tmp

総合スコア277

検出処理を別スレッドに飛ばしたりするなどの工夫が必要ですね。難しいですが、検出器側の大まかな構造だけを実装してみました（検出ロジックは未実装）。

非同期処理にはthreadingを用いて、タスクのパイプにはqueueを使っています。Pythonのqueueはスレッドの同期に利用できるのでうまく使うと便利です。検出処理自体はrunの中に実装しますが、タスクの開始はdetectメソッドを通して行い、getResultで結果を回収することを意図した構造です。

一番したに試しにこのクラスを使ったコードがあります。動作を追いかけられますかね。

Python
1import threading
2import queue
3import time
4import numpy
5
6
7class Detector(threading.Thread):
8    """非同期の物体検出クラス"""
9
10    def __init__(self):
11        threading.Thread.__init__(self)
12        self._lock = threading.Lock()
13
14        # タスク・結果のパイプ用
15        self._task_queue = queue.Queue(maxsize=1)
16        self._result_queue = queue.Queue(maxsize=1)
17
18        # threadを開始させておく
19        self.busy = False
20        self.start()
21
22    def join(self):
23        """終了処理"""
24        self._task_queue.put(None)
25        super(Detector, self).join()
26
27    def run(self):
28        """検出処理をするメイン部分"""
29        while True:
30            with self._lock:
31                task = self._task_queue.get()
32                if task is None:
33                    # スレッドの終了条件
34                    break
35                self.busy = True  # 追加の処理を受け付けないようにする
36                print("get task", task, time.ctime())
37
38            # 重い処理の部分
39            time.sleep(3)
40
41            # 結果をqueueに送り込む
42            with self._lock:
43                if self._result_queue.full():
44                    # 回収されていない結果は捨てる
45                    self._result_queue.get()
46                self._result_queue.put("done " + task)
47                self.busy = False
48                print("finishd task", task, time.ctime())
49        print("thread finished", time.ctime())
50
51    def detect(self, img: numpy.array) -> bool:
52        """非同期で検出処理を受け付ける"""
53        if not self.busy:
54            self._task_queue.put(img)
55            return True
56        return False
57
58    def getResult(self) -> (bool, list):
59        """結果を回収する"""
60        if not self._result_queue.empty():
61            return True, self._result_queue.get()
62        return False, []
63
64
65if __name__ == "__main__":
66    det = Detector()
67    print(det.detect("hoge"))  # 検出処理を依頼
68    time.sleep(1)
69    print(det.detect("foo"))  # 続けざまに処理を依頼すると拒否される
70    time.sleep(3)  # 終了するまで待つ
71    done, result = det.getResult()  # 結果の回収を試みる
72    if done:
73        # もしタスクが完了していれば
74        print("result ", result)
75    # 前のタスクが完了していれば次の処理を受け付ける
76    print(det.detect("bar"))
77    det.join()

投稿2018/11/13 12:10

編集2018/11/13 14:47

tachikoma

総合スコア3601

ベストアンサー

基本的に Deep Learning は GPU を使うべきなのですが、Mac ということは増設したりは無理ですよね。
なので現状でできることとしたら、検出は2～3フレームに1回だけ行い、検出しないフレームでは最新の検出結果をそのまま使うことでしょうか？
例えば、検出を2フレームに1回にしたら、それだけでFPSは倍稼げますよね

投稿2018/11/13 11:49