TensorFlowで大量のデータを学習する方法がわかりません

###状況・到達目標
TensorFlowを使って画像の学習をしようとしています。
合計1000枚、10000枚くらいの大量のデータの学習をしようとしています。

###学習用に作ったプログラム
プログラムを書くに当たって参考にさせていただいた記事です。
-TensorFlowで「けものフレンズ」の”フレンズ判別器”作ってみた
-TensorFlowでアニメゆるゆりの制作会社を識別する

Python
1import os
2import cv2
3import numpy as np
4import tensorflow as tf
5 
6path=os.getcwd()+'/data/'
7class_count = 0
8folder_list=os.listdir(path)
9 
10for folder in folder_list:
11  class_count = class_count+1
12 
13NUM_CLASSES = class_count
14IMAGE_SIZE = 28
15IMAGE_PIXELS = IMAGE_SIZE*IMAGE_SIZE*3
16 
17flags = tf.app.flags
18FLAGS = flags.FLAGS
19flags.DEFINE_string('label', 'label.txt', 'File name of label')
20flags.DEFINE_string('train_dir', './', 'Directory to put the training data.')
21flags.DEFINE_integer('max_steps', 100, 'Number of steps to run trainer.')
22flags.DEFINE_integer('batch_size', 20, 'Batch size'
23                     'Must divide evenly into the dataset sizes.')
24flags.DEFINE_float('learning_rate', 1e-4, 'Initial learning rate.')
25 
26# 予測モデルを作成する関数
27def inference(images_placeholder, keep_prob):
28    # 重みを標準偏差0.1の正規分布で初期化
29    def weight_variable(shape):
30      initial = tf.truncated_normal(shape, stddev=0.1)
31      return tf.Variable(initial)
32 
33    # バイアスを標準偏差0.1の正規分布で初期化
34    def bias_variable(shape):
35      initial = tf.constant(0.1, shape=shape)
36      return tf.Variable(initial)
37 
38    # 畳み込み層の作成
39    def conv2d(x, W):
40      return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')
41 
42    # プーリング層の作成
43    def max_pool_2x2(x):
44      return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
45                            strides=[1, 2, 2, 1], padding='SAME')
46     
47    # 入力を28x28x3に変形
48    x_image = tf.reshape(images_placeholder, [-1, 28, 28, 3])
49 
50    # 畳み込み層1の作成
51    with tf.name_scope('conv1') as scope:
52        W_conv1 = weight_variable([5, 5, 3, 32])
53        b_conv1 = bias_variable([32])
54        h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
55 
56    # プーリング層1の作成
57    with tf.name_scope('pool1') as scope:
58        h_pool1 = max_pool_2x2(h_conv1)
59     
60    # 畳み込み層2の作成
61    with tf.name_scope('conv2') as scope:
62        W_conv2 = weight_variable([5, 5, 32, 64])
63        b_conv2 = bias_variable([64])
64        h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
65 
66    # プーリング層2の作成
67    with tf.name_scope('pool2') as scope:
68        h_pool2 = max_pool_2x2(h_conv2)
69 
70    # 全結合層1の作成
71    with tf.name_scope('fc1') as scope:
72        W_fc1 = weight_variable([7*7*64, 1024])
73        b_fc1 = bias_variable([1024])
74        h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
75        h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)
76        # dropoutの設定
77        h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
78 
79    # 全結合層2の作成
80    with tf.name_scope('fc2') as scope:
81        W_fc2 = weight_variable([1024, NUM_CLASSES])
82        b_fc2 = bias_variable([NUM_CLASSES])
83 
84    # ソフトマックス関数による正規化
85    with tf.name_scope('softmax') as scope:
86        y_conv=tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)
87 
88    # 各ラベルの確率のようなものを返す
89    return y_conv
90 
91# lossを計算する関数
92def loss(logits, labels):
93    cross_entropy = -tf.reduce_sum(labels*tf.log(logits))
94    tf.summary.scalar("cross_entropy", cross_entropy)
95    return cross_entropy
96 
97# 訓練のOpを定義する関数
98def training(loss, learning_rate):
99    train_step = tf.train.AdamOptimizer(learning_rate).minimize(loss)
100    return train_step
101 
102# 正解率(accuracy)を計算する関数
103def accuracy(logits, labels):
104    correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(labels, 1))
105    accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
106    tf.summary.scalar("accuracy", accuracy)
107    return accuracy
108 
109if __name__ == '__main__':
110    count=0
111    folder_list=os.listdir(path)
112 
113    train_image = []
114    train_label = []
115    test_image = []
116    test_label = []
117     
118    f = open(FLAGS.label, 'w')
119    for folder in folder_list:
120      subfolder = os.path.join(path,folder)
121      file_list = os.listdir(subfolder)
122 
123      filemax = 0
124 
125      for file in file_list:
126        filemax = filemax + 1
127 
128      # train : test = 9 : 1
129      file_rate = int(filemax/10*9)
130 
131      i = 0
132 
133      for file in file_list:
134 
135        img = cv2.imread('./data/' + folder + '/' + file)
136        img = cv2.resize(img, (IMAGE_SIZE, IMAGE_SIZE))
137        if i <= file_rate:
138           train_image.append(img.flatten().astype(np.float32)/255.0)
139           tmp = np.zeros(NUM_CLASSES)
140           tmp[int(count)] = 1
141           train_label.append(tmp)
142        else:
143           test_image.append(img.flatten().astype(np.float32)/255.0)
144           tmp = np.zeros(NUM_CLASSES)
145           tmp[int(count)] = 1
146           test_label.append(tmp)
147 
148        i = i + 1
149 
150      label_name = folder + '\n'
151      f.write(label_name)
152      count=count+1
153    f.close()
154 
155    train_image = np.asarray(train_image)
156    train_label = np.asarray(train_label)
157    test_image = np.asarray(test_image)
158    test_label = np.asarray(test_label)
159     
160    with tf.Graph().as_default():
161        # 画像を入れる仮のTensor
162        images_placeholder = tf.placeholder("float", shape=(None, IMAGE_PIXELS))
163        # ラベルを入れる仮のTensor
164        labels_placeholder = tf.placeholder("float", shape=(None, NUM_CLASSES))
165        # dropout率を入れる仮のTensor
166        keep_prob = tf.placeholder("float")
167 
168        # inference()を呼び出してモデルを作る
169        logits = inference(images_placeholder, keep_prob)
170        # loss()を呼び出して損失を計算
171        loss_value = loss(logits, labels_placeholder)
172        # training()を呼び出して訓練
173        train_op = training(loss_value, FLAGS.learning_rate)
174        # 精度の計算
175        acc = accuracy(logits, labels_placeholder)
176 
177        # 保存の準備
178        saver = tf.train.Saver()
179        # Sessionの作成
180        sess = tf.Session()
181        # 変数の初期化
182        sess.run(tf.initialize_all_variables())
183        # TensorBoardで表示する値の設定
184        summary_op = tf.summary.merge_all()
185        summary_writer = tf.summary.FileWriter(FLAGS.train_dir, sess.graph)
186         
187         
188        # 訓練の実行
189        for step in range(FLAGS.max_steps):
190            for i in range(int(len(train_image)/FLAGS.batch_size)):
191                # batch_size分の画像に対して訓練の実行
192                batch = FLAGS.batch_size*i
193                # feed_dictでplaceholderに入れるデータを指定する
194                sess.run(train_op, feed_dict={
195                  images_placeholder: train_image[batch:batch+FLAGS.batch_size],
196                  labels_placeholder: train_label[batch:batch+FLAGS.batch_size],
197                  keep_prob: 0.5})
198 
199            # 1 step終わるたびに精度を計算する
200            train_accuracy = sess.run(acc, feed_dict={
201                images_placeholder: train_image,
202                labels_placeholder: train_label,
203                keep_prob: 1.0})
204            print ("step %d, training accuracy %g"%(step, train_accuracy))
205 
206            # 1 step終わるたびにTensorBoardに表示する値を追加する
207            summary_str = sess.run(summary_op, feed_dict={
208                images_placeholder: train_image,
209                labels_placeholder: train_label,
210                keep_prob: 1.0})
211            summary_writer.add_summary(summary_str, step)
212 
213    # 訓練が終了したらテストデータに対する精度を表示
214    print ("test accuracy %g"%sess.run(acc, feed_dict={
215        images_placeholder: test_image,
216        labels_placeholder: test_label,
217        keep_prob: 1.0}))
218 
219    # 最終的なモデルを保存
220    save_path = saver.save(sess, "./model.ckpt")
221

###発生している問題
以下、先ほどのプログラムの実行時の様子です。(各クラス200枚の画像のとき)
メモリのエラーなどは全く出ていませんでした。

step 0, training accuracy 0.112175
step 1, training accuracy 0.112175
step 2, training accuracy 0.112175
・
・
・

上記のように学習を進めていっても「training accuracy」の値が変化せず、学習がうまくできていないと思われます。

###試したこと
各クラス100枚で実行すると学習は成功しました。
各クラス200枚で実行すると「training accuracy」の値が変化せず学習は失敗しました。
また、今回の問題に関係があるかわかりませんが100枚でも「training accuracy」の値が変化しないことがありました。これはプログラム終了時に保存される「model.ckpt」一度削除することで改善できていそうでした。
各クラス1000枚などでも試したのですが学習は失敗しました。
TensorFlowはcpu版、gpu版ともに試しましたがどちらも上記と同じ結果となりました。
また、TensorFlow 大量の画像から学習するには・・・〜(ほぼ)解決編〜で示されている精度計算時にバッチ処理を行うことも試しましたがこちらも上記と同じ結果となりました。

###補足情報
開発環境
-Windows10(64bit)
-Python3.5.0(仮想環境)(Anaconda4.4.0(64bit))
-TensorFlow1.0.0(cpu版)
-TensorFlow1.0.0(gpu版)

やっていることについて詳しくまとめた記事です。

行動規範の内容に同意します

回答1件

ベストアンサー

2017-11-09 こんにゃくさんのフィードバックを反映

意味ありげな直線（実際は指数関係）になりますね！

こんにゃくさん、ケースバイケースかもしれませんが、
この傾向はたぶんこれって結構すごそうな結果だと思います！

直感で回答ですが、

・画像サイズ問題
IMAGE_SIZE = 28、つまり28px四方で判断ということは、人の顔でも誰か見分けるのは割と大変だと思います。
もう少し画像サイズを大きく（2倍の48px、2^6の64pxなど）してはいかがでしょうか？

・学習率問題
先日別件で、同じような事例を見ました。

学習率の値を1e-5にしたところ正常に学習値があがるようになりました。

この学習率が今回は適正だった為でしょうか。
また、1e-4や1e-8の学習値では上記の実行結果のように正常な学習ができませんでした。

学習率を1e-5位まで下げた方が良いかもしれませんね。

投稿2017/07/16 22:13

編集2017/11/09 12:57

退会済みユーザー

総合スコア0

退会済みユーザー

2017/07/17 10:51

解答ありがとうございます！各クラス200枚で実行する際に学習率を1e-5にしてみたところうまく学習することができました。学習率に問題があったようです。また各クラス500枚程度で実行するとメモリエラーとなったのですが、こちらは精度計算をバッチ処理にすることで解決しました。これは1000ステップ、バッチサイズ20、学習率1e-6にすることで学習することができました。ただ、モデルの精度があまり高くならなかったので、もっと上手に学習できるパラメータがあるのではないかと思いました。画像サイズについては試していないのでわかりませんが今後精度を改善する際に参考にさせていただきます。今回の件で適切なパラメータを設定するコツを掴むことができました。また、学習データに合わせてニューラルネットの構成を変更するなどしてより高い精度の学習ができるように実験してみたいと思いました。

退会済みユーザー

2017/11/09 12:40

貴重なフィードバックをありがとうございます。少なくとも学習率については、各クラスの画像数が多いほど学習率を下げないとうまくいかないようですね。

退会済みユーザー

2017/11/09 21:54

ちなみに、無理やりフィッティング：　learing rate = 3.854e^-0.805 * (pictures per classes) 無理やり簡素化：　　　　　learing rate = e^(-1 * pictures per classes)

退会済みユーザー

2017/11/09 23:09

なるほど！このような学習率の決め方もあるのですね！参考にさせていただきます。

退会済みユーザー

2017/11/10 10:28

Tensorflowの挙動がよく分からないので、世界中の皆様が手探りであれこれやっている最中なのです。（Googleのようなエスパー集団は実はあれこれ知っていてわざと隠しているのかもしれませんが）たくさんの人が「どこらへんがアタリなのか分からない、AがだめBがだめでもABはよさげ」、みたいなことがあるかもしれないなかでもがいているので、こういう傾向が少しでもつかめるのはとても貴重なことだと思います。

行動規範の内容に同意します