MNISTが収束しない(theano)

初歩的な質問で恐縮ですが、theanoでMNISTの手書き文字分類をしたのですが、うまく収束しません。
http://deeplearning.net/tutorial/logreg.html
にあるように、

損失関数を

python
1-T.mean(T.log(self.p_y_given_x)[T.arange(y.shape[0]), y])

のように記述すれば、下記のコード(損失関数以外は省略しています)のように損失関数が単調減少するのですが、

python
1import theano as th
2from numpy import *
3from sklearn.datasets import fetch_mldata
4
5mnist = fetch_mldata('MNIST original', data_home=".")
6
7x_arr = mnist["data"]
8y_arr = mnist["target"]
9
10size = x_arr.shape[0]
11
12x = th.tensor.dmatrix("x").astype(th.config.floatX)
13y = th.tensor.dmatrix("y").astype("int32")
14
15W = th.shared(random.rand(784,10).astype(dtype=th.config.floatX), borrow=True)
16b = th.shared(random.rand(10).astype(dtype=th.config.floatX), borrow=True)
17
18out = th.tensor.nnet.softmax(x.dot(W) + b)
19#loss = -th.tensor.mean(y * th.tensor.log(out + 1e-4))
20loss = -th.tensor.mean(th.tensor.log(out + 1e-4)[th.tensor.arange(y.shape[0]), y.flatten()]).astype(th.config.floatX)
21
22idx = th.shared(0).astype(dtype="int32")
23
24x_arr = th.shared(x_arr.astype(dtype=th.config.floatX))
25y_arr = th.shared(y_arr.astype(dtype="int32"))
26
27f = th.function(inputs=[idx],
28                outputs=loss,
29                on_unused_input='ignore',
30                updates=[(W, W - 0.0001 * th.grad(loss, W)),
31                         (b, b - 0.0001 * th.grad(loss, b))],
32                givens=[(x, x_arr[idx:idx+100]), (y, y_arr[idx:idx+100, None])])
33
34for i in range(100):
35    for j in range(0, size-100, 100):
36        loss = f(j)
37    print(loss)

処理結果

csv
13.4998669624328613
22.970357656478882
32.5788233280181885
42.3907742500305176
52.0066370964050293
62.117086410522461
72.026233434677124
82.0261969566345215
91.9340922832489014
101.473728895187378
111.381466031074524
121.4735664129257202
131.381466031074524
141.4735510349273682
151.4450583457946777
161.5514671802520752

しかしながら、損失関数を

python
1-th.tensor.mean(out * th.tensor.log(y + 1e-4))

のように書くとうまく収束しません(この場合、入力データ(10のベクトル)と付き合わせるためone-hot表現にしています)。実装は下記のようになります。

python
1import theano as th
2from numpy import *
3from sklearn.datasets import fetch_mldata
4
5mnist = fetch_mldata('MNIST original', data_home=".")
6
7x_arr = mnist["data"]
8
9idx = mnist["target"]
10arr = zeros((idx.shape[0],10)).flatten()
11arr[idx.flatten().astype(int) + arange(idx.shape[0]) * int(idx.max())]  = 1
12arr = arr.reshape(idx.size, 10)
13y_arr = arr
14
15size = x_arr.shape[0]
16
17x = th.tensor.dmatrix("x").astype(th.config.floatX)
18y = th.tensor.dmatrix("y").astype("int32")
19
20W = th.shared(random.rand(784,10).astype(dtype=th.config.floatX), borrow=True)
21b = th.shared(random.rand(10).astype(dtype=th.config.floatX), borrow=True)
22
23out = th.tensor.nnet.softmax(x.dot(W) + b)
24loss = -th.tensor.mean(out * th.tensor.log(y + 1e-4))
25
26idx = th.shared(0).astype(dtype="int32")
27
28x_arr = th.shared(x_arr.astype(dtype=th.config.floatX))
29y_arr = th.shared(y_arr.astype(dtype="int32"))
30
31
32f = th.function(inputs=[idx],
33                outputs=loss,
34                on_unused_input='ignore',
35                updates=[(W, W - 0.0001 * th.grad(loss, W)),
36                         (b, b - 0.0001 * th.grad(loss, b))],
37                givens=[(x, x_arr[idx:idx+100]), (y, y_arr[idx:idx+100])])
38
39for i in range(100):
40    for j in range(0, size-100, 100):
41        loss = f(j)
42    print(loss)
43

処理結果

csv
10.9210340387405115
20.9210340400696587
30.9210340395120877
40.9210340392042159
50.9210340394320792
60.9210340403567177
70.9210340386768939
80.9210340387548388
90.9210340393186343
100.9210340400540475
110.9210340393799111
120.9210340389116061
130.9210340402111538
140.9210340400905459
150.9210340405027893
160.9210340398349144
170.9210340407418887
180.9210340394700124

tensorflowでは後者の記述で書いてある記事を見ましたが、
うまく実装出来ているようです。ご存知の方いらっしゃいましたら
ご教示ください。

行動規範の内容に同意します

回答1件

ベストアンサー

クロスエントロピーの式ですが、以下ではないかと思います。

python
1out = th.tensor.nnet.softmax(x.dot(W) + b)
2xent =  -(1-y) * th.tensor.log(1-out + 1e-8)
3loss = xent.mean()

loss は以下のように出力されます。

1.7534503784048883
1.754406510726838
1.756005242674942
1.755949974408118
1.7521872399803033
1.7475800620197386
1.7441947505411761
1.741671160069344
1.7403550574880282
1.7391483742858345
1.7374770824938437
1.7363582456237157
1.7362235338070966
1.7363039307688137
1.7361688543865799
1.7363115484172862
1.7370884312548314

tensorflowでうまく行っているというのはわかりません。

投稿2017/07/30 11:00

編集2017/07/30 11:08